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Preface 


This is both a first and a second level course in Pascal. It starts at an elementary 
level and works up to a point where problems of realistic complexity can be tackled. 
It is aimed at two audiences: on the one hand the computer professional who has a 
good knowledge of Cobol or Fortran but needs convincing that Pascal is worth 
learning, and on the other hand the amateur computer enthusiast who may have a 
smattering of Basic or may be an absolute beginner. 

Its approach is based on two principles that are not always widely recognized. 

The first is that computing is no longer a specialist subject. In the early days 
of computing a priesthood arose whose function was to minister to those awesome, 
and awesomely expensive, machines. Just as in the ancient world, when illiteracy 
was rife, the scribes formed a priestly caste with special status, so the programmers 
of yesteryear were regarded with reverence. But times are changing: mass computer- 
literacy is on its way. We find already that when a computer enters a classroom it 
is not long before the pupils are explaining the finer points of its use to their 
teacher — for children seem to have greater programming aptitude than adults. 

This book, it is hoped, is part of that process of education by which the computer 
is brought down to earth; and therefore it attempts to divest computing of the 
mystique (and deliberate mystification) that still tends to surround the subject. 

The other principle is that the second best way to achieve competence as a 
programmer is to read non-trivial programs and see how they work. (The best way 
is of course to write programs, and plenty of them.) So a large proportion of this 
book is taken up by full descriptions and listings of four good-sized programs that 
are far removed from the toy examples sometimes shown in programming textbooks. 
This aspect, based on the fact that people learn by example, is too often neglected 
by authors on programming, who tend to feel they have done their duty once they 
have presented the rules of the language. 

Pascal has been chosen because it is elegant enough to appeal to computer 
scientists and professional programmers while still being simple enough to be 
taught to almost anyone who is genuinely interested in programming. It is also 
compact enough to run on many microcomputers — those Volkswagens of the 
computer age. Indeed there are signs that Pascal will become the lingua franca 
of the coming ‘computer revolution’ in schools, businesses and the home. This 
book is for anyone who wants to join that revolution. 


October 1981 Richard Forsyth 
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PART ONE 


Pascal at Large 


‘The limits of my language mean the limits of my world.’ 
Ludwig Wittgenstein, 7ractatus Logico-Philosophicus 


Introduction 


On choosing a programming language 


I have never actually seen anyone come to blows over the choice of a programming 
language, but it is an issue that can inspire heated argument. I have heard high-brow 
Lisp addicts snidely denigrating Algol 68 (not Algol 60, that was even beneath 
contempt) at an Artificial Intelligence symposium; while on another occasion I 
listened meekly to an experienced systems programmer pouring torrents of abuse 
on the benighted fools who persisted with ‘infantile’ and ‘mind-polluting’ languages 
like Fortran and Cobol long after the true way (Algol 68 again) had been revealed 
to the world. In the circumstances I thought it prudent to keep quiet about the 
fact that I used such a demotic language as Basic. 

Indeed few subjects arouse such passions in the data-processing fraternity. If a 
letter appears one week in the correspondence columns of Computer Weekly or 
another journal of that ilk, championing, say, PL/I and describing Cobol, for 
example, in unfavourable terms, it is sure to provoke an immediate storm of protest 
from a legion of Cobol loyalists which will take months to subside. Others will join 
the fray on both sides and the controversy will probably not die down until the 
editor intervenes to halt it. Much the same applies to other languages, each of which 
has its devotees and detractors — some so zealous as to suggest quasi-religious 
fervour. Pascal is particularly prone to inspire missionary zeal. 

Against this background I do not wish to pretend that I can offer a rational 
and unbiased evaluation of the merits of Pascal. Nevertheless I believe it is 
important to persuade you that it has some notable advantages that make it worth 
the effort of learning. The fact that people get emotional about the relative 
strengths and weaknesses of programming languages shows that the question is 
not a trivial one. 

The importance of language in moulding thought has become known in linguistics 
as the ‘Whorfian Hypothesis’ after Benjamin Lee Whorf, a scholar who drew many 
of his conclusions from a study of the different modes of thought between 
American Indian languages and European ones. It is perhaps best expressed by 
Edward Sapir, Whorf’s teacher and colleague (Carroll, 1956). 


‘Human beings do not live in the objective world alone, nor alone in the world of 
social activity as ordinarily understood, but are very much at the mercy of the 
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particular language which has become the medium of expression in their society. It 
is quite an illusion to imagine that one adjusts to reality essentially without the use 
of language and that language is merely an incidental means of solving specific 
problems of communication or reflection.’ 


Applied to computing, this implies that a programming language is not a passive 
instrument but an active collaborator. It means that when you write a Pascal 
program you are standing on the shoulders of Niklaus Wirth, the inventor of that 
language. And, quite clearly, the further you want to see, the taller should be the 
giant on whose shoulders you stand. For whether you notice it or not, the 
programming language you use lays down a style of approach which necessarily 
constrains its user by making some techniques easy and others difficult. 


Plan of this book 


This book is divided into four parts. The first part describes the rules of the Pascal 
language, and shows how its features may be put to use. It also outlines some 
precepts of good programming practice. 

But merely learning the rules of Pascal is not the same as knowing how to write 
a Pascal program, so the ‘meat’ of the book comes in Parts 2 and 3. Part 2 
concentrates on two medium-to-large programs, one to sort a file of data into 
order, the other to find the shortest path through a network. Both are serious 
problems with many real-life applications and both are explained in detail. These 
programs were written and run on a large mainframe computer (the DEC System-10) 
using a Pascal compiler from the University of Hamburg. 

Computing is not all hard work, however. The microprocessor has liberated the 
computer from the fortified citadel of the commercial data-processing department 
and from the cloistered sanctuary of the big university installation; and now that 
computers have come to the people, people (especially children) find they are fun 
to play with. The personal computers of today already have facilities for colour 
graphics and sound output that make games more lively — the best example being 
“Space Invaders’ — and the personal computers of tomorrow will have all this and 
more. 

Thus it is appropriate that the programs in Part 3, which is concerned with the 
more frivolous aspects of computing, were written and tested on the Research 
Machines 380Z, a microcomputer which runs the popular CP/M operating system, 
using Pascal/Z. The first program in Part 3 simulates a soccer game and the second 
plays Go-Moku, an ancient oriental game. Both are significant pieces of work and 
the reader will find complete program listings, printout from trial runs and 
descriptions of methods used to help him or her come to grips with them. Reading 
and understanding programs which are not just “Mickey Mouse’ examples is the 
fastest way to appreciate the power of a programming language. 

The fact that Pascal can cope with such a variety of tasks is, in my view, one 
of the strongest arguments in its favour. 


INTRQDUCTION 


The ideal reader will first peruse this book, then use it. Programming is learnt 
by doing. To derive any benefit from the book, you will need to read it in 
conjunction with practical work on a computer. If you have no prospect of access 
to a computer system with Pascal, you may as well stop reading now. 

The crunch will come at the end of Chapter 5. Up to that point it is all reading; 
from then on there are plenty of exercises to keep you on your toes. If you do 
not attempt the exercises (or some equivalent pieces of your own) you will 
gradually lose contact, so that by about half-way through you will not understand 
what you are reading. Even just typing in the example programs and getting them 
to work on your system is better than no programming at all. 

(Typographical note: tor clarity all computer output is shown in capitals 
throughout the book; user input and program listings appear in lower case.) 
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T 
Writing programs 


A computer is simply a machine for processing information. Since it cannot answer 
questions but can only obey orders it must be given a sequence of instructions in a 
language it can understand — a program — to make it do anything useful. Someone 
has to write that program; you, for instance. 

There is not time here to give a potted history of computers from the abacus to 
the silicon chip (as some old-fashioned programming texts did) or to delve into the 
electronics of how they work (as some of the newer books do), but a little 
background information about compilers and high-level languages should be useful 
to the prospective Pascal programmer. 

(If you do want to read about computer history and hardware two excellent 
books are Using Computers and The Making of the Micro (Meek and Fairthorne, 
1977; Evans, 1981).) 


1.1. High level languages 


The central processing unit of a computer can only obey instructions that are 
encoded as groups of binary digits, termed ‘bits’, i.e. as sequences of zeroes and 
ones, such as 01110110. Such instructions are said to be in machine code. Two-state 
devices are easy to construct — for instance a transistorized switch may be on or off, 
a voltage may be high or low, a tiny piece of magnetized material may have polarity 
north—south or south—north, a current may be flowing or not — and so binary 

code is very convenient for computing machines. But it is extremely inconvenient 
for humans, and no one in their right mind writes programs in machine code these 
days. ’ 

The task of translating from a more symbolic notation to machine code is one 
that can be mechanized, and historically the first programs to carry out this task 
were ‘assemblers’. The main advantage of the assembler is that its user can employ 
symbolic names composed of letters and digits instead of having to remember their 
binary equivalents. The assembler then converts from something like 


add a,100 
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to something like 
11000110 01100100 


(Zilog Z-80 example). 

Assembly language is still restrictive in two respects however: firstly it is tied to 
the particular machine used, each computer ordinarily having its own machine code, 
which means that assembly language programs cannot normally be transferred from 
one machine to another; secondly the operations available are rather primitive since 
they correspond to things that the processor can carry out directly — things like 
comparing two characters or adding two numbers together. It is said to be a ‘low 
level’ language. 

By contrast each instruction in a ‘high level’ language typically involves many 
machine-code instructions; and because the language is defined on paper (not built 
into a specific processor) it can, in theory, be common to a variety of different 
machines. A more complex translator program, known as a ‘compiler’, is required 
to implement a high level language. It is important to realize that a compiler is just 
another program: its distinctive feature is that its input is a program in one (high 
level) language and its output is a version of the same program in a different (low 
level) language (Fig. 1.1). 


Input Program Output 
High level Compiler Machine | Compilation 
language code phase 
Machine Results Execution 
code phase 


Figure 1.1 A compiler is a program 


The first high level language to be widely used was Fortran, developed in 1956. 
In scientific circles Fortran is still widely used today. The next important high 
level language to arrive on the scene was Cobol, first defined in 1960. The US 
government made the provision of a Cobol compiler mandatory for computer 
suppliers competing for government contracts, and this backing ensured its success. 
It has been said that of all programs written by programmers who are paid for their 
work the number of lines in Cobol far exceeds those in all other languages added 
together (and not just because Cobol is rather verbose). In fact, most advertisements 
recruiting programmers in the computer trade press tend to include the magic 
formula ‘two years Cobol experience required’. 

In 1964 Basic hit the scene and the world has never been the same. It was 
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devised by two professors at Dartmouth College, New Hampshire, USA (Kememy 
and Kurtz) to provide students with a way of harnessing the power of their newly 
installed time-sharing service without lengthy initiation (Kemeny and Kurtz, 1971). 
Basic is supposed to stand .for Beginner’s All-purpose Symbolic Instruction Code, 
and it really lives up to its name: for the first time it provided a language that the 
non-specialist could master in a matter of hours. Probably its greatest strength was 
that it was designed for interactive use, in which a user typed something to the 
computer, received some results in reply, typed some more and so on. This mode 
of interaction enabled the novice to make genuine progress quickly. 

It is fair to say that the inventors of Basic did not realize what they had done. 
Here was a programming language that, almost as soon as it was released, began 
to evolve like a natural language. Instead of being defined by a committee or 
imposed by a manufacturer it took off and became public property. In the 
educational world, and later among microcomputer users, it displaced all its 
competitors. Suddenly ordinary people were writing programs. 

So by the late 1960s the situation was the scientists used Fortran because it 
was good for heavy calculations (‘number-crunching’), commercial data-processing 
departments insisted on Cobol because it made data definition easy and had good 
facilities for handling large ‘files’ of information and also because it had an 
internationally recognized standard, while the educational world was messing 
about with Basic. There were also many other languages for special purposes, 

e.g. Lisp in artificial intelligence, Snobol for string handling and pattern matching, 
Coral 66 for real-time applications, etc. 

Many people felt that this situation was unsatisfactory. Also computer scientists 
had long realized that Basic, Cobol and Fortran contained glaring deficiencies that 
made the production of reliable software harder than necessary. Accordingly at 
least two groups set out to produce the ideal all-purpose programming language 
which would include everything that Basic, Cobol, Fortran and Algol 60 (which 
was not widely used but which had considerable influence on programming 
language design) could provide and much more besides — in a much more logical 
manner. The fruits of these Herculean labours were PL/I and Algol 68, which 
appeared in their final forms in 1967 and 1969, respectively. 

The PL/I group was sponsored by IBM. Their efforts were impressive and many 
programmers do now use PL/I, though far fewer than Cobol or Fortran, both of 
which it was meant to supplant. Allowing for the fact that PL/I received 
considerable backing from IBM, the world’s largest computer company, it must be 
judged a failure. The trouble was, it was just too big. Even today PL/I compilers 
tend to be gargantuan pieces of software requiring big machines to run on. And no 
one can claim to know the language fully. 

Algol 68 was produced under the auspices of IFIP — the International 
Federation for Information Processing. Its principal inventor was van Wijngaarden 
of the Mathematical Centre, Amsterdam. Those who like Algol 68 swear by it, and 
it has found some favour in academic circles. Certainly its creation was an intellectual 
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feat of some magnitude. But it is still a vast language, needing a big compiler and 
therefore a big computer, and it has one important disadvantage: it is one of the 
hardest high level languages to learn. (If you don’t believe me, try it.) Despite its 
theoretical elegance, this fact has condemned it to remain a minority language. 

One of the submissions that IFIP rejected when they chose Algol 68 as the 
language of the future came from a computer scientist working at the Institute 
for Informatics in Zurich, Niklaus Wirth. Wirth is a man who invents programming 
languages the way other people solve crossword puzzles, and he did not let the 
rejection of his proposal by IFIP stop his work on the design of a successor to 
Algol 60. In 1971 he published his language, named Pascal after the great 
seventeenth-century French philosopher who had invented the first automatic 
adding machine. It was slightly modified in response to user experience and released 
in revised form in 1973 (Jensen and Wirth, 1975). 

Like Basic, but unlike most of the other languages we have mentioned, Pascal 
is a language that people do not have to be persuaded to adopt. Soon after its 
announcement user groups sprang up in several countries to exchange information 
and ideas on Pascal, to help each other implement compilers for it, and generally 
to spread the good news. The good news was that the breach between the ordinary 
user, who wants a language that is easy to learn, and the computer scientist, who 
wants a coherent, logical design, had at last been healed. Pascal had similar 
objectives to those of PL/I and Algol 68: the reason it caught on much better is 
that it was concise. The rules of its grammar can be written down on four sheets of 
paper (see Appendix D). It does not have delusions of grandeur. It does not pretend 
to include every facility that everyone has ever thought of. Naturally a Pascal 
compiler need not be a costly and unwieldy program. Above all, it is not a 
committee language. It is one man’s work, and has a unifying philosophy running 
through it. 

As I hope you will discover in succeeding pages, Pascal is not difficult to learn. 
It is almost as easy as Basic, but it has a great advantage over Basic: it actively 
encourages a Style in which programs are built up step by step from small 
well-defined procedures in a methodical way. It overcomes a major weakness of 
Basic, namely the lack of a proper subroutine facility. This becomes increasingly 
important the larger the program being written (see Chapter 7). 

As an interesting footnote we may examine what happens to a good idea when 
a committee gets hold of it. In the late 1970s the US Department of Defense issued 
stringent specifications for a new computer language, ultimately to be adopted for 
all military applications in the USA (and, by implication, for almost every other 
serious software project in the western world). Several teams submitted proposals. 
Eventually the contenders were whittled down to two, a ‘green’ and a ‘red’ 
language, both based on Pascal. The outcome of all this weeding out process was 
Ada — a new Pascal-based programming language. Ada has everything. It is intended 
for any application from air-traffic control to payroll. If events run their appointed 
course Ada programs will do everything for us from controlling the central heating 
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systems bubbling away underneath us as we sleep to launching the missiles that will 
start the Third World War over our heads. 

You will gather that I am not an Ada aficionado. Though it was primarily the 
brainchild of one man, Jean Ichbiah of CI] Honeywell-Bull, it has been inflated 
to the point where it bears all the hallmarks of a committee language. It is large, 
long-winded and pedantic — not so much a woolly mammoth (like PL/T) more a 
Brontosaurus! 


1.2 Algorithms and programs 


The concept of an algorithm is important in programming. A program is a sequence 
of instructions for a computer. An algorithm is a well-defined procedure for 
obtaining a result in a finite number of steps. Thus an algorithm is a more abstract 
idea than a program. There were algorithms long before there were computers: you 
learned at least one when you were taught how to do long division at school. The 
relationship between algorithms and programs is rather like that between minds 
and brains. 

Most students of programming are exhorted to design algorithms before writing 
programs. This is intended to make them concentrate on problem-solving rather 
than the arbitrary features of some particular programming language. It is good 
advice. But we live in a practical world. No one is employed as an ‘algorithm 
designer’, though there are millions of programmers; and you cannot get results 
from your computer until your algorithm is embodied in a program. So we will 
not over-emphasize the algorithmic approach. An algorithm may be perfectly 
sound logically, but quite unsuitable as the basis of a computer program. This 
book takes a pragmatic approach, preferring to talk in the less technical terms of 
‘methods’ or ‘techniques’ instead. We assume that your aim is to obtain some 
results that help to solve a problem. A program is merely a means to that end, 
and an algorithm merely one stage in producing a program. 


1.3. Flowcharting 


A flowchart for making a cup of tea is shown in Fig. 1.2. You can assume that 

this represents instructions for some sort of household robot. Notice how much 
attention to detail you have to pay. If a human servant needed everything spelled 
out like this you would consider him a moron. But nobody can afford human 
servants anyway in this age of machines, so one day you might end up writing 

just such a program. Even if you dismiss this scenario as unrealistic, it does give you 
the flavour of preparing instructions for a machine (like a computer). Remember: 
nothing will be taken for granted. Every contingency must be provided for in 
advance, or the program will not work. The computer cannot suddenly switch into 
‘common sense mode’ when some unexpected condition arises. 


Fill kettle 


YES IMPORTS 
Cups 
Kettle 
NO Milk 

Spoon 
Sugar 

Plug it in Tealeaves 

and switch on Tea pot 


Water 


EXPORTS 
Tea 


boiling ? 


NO 
Put tea in pot 
Pour in hot water 


Wait 100 secs 


Pour from pot . 
NO 
NO 


Enough 
_ sugar ? YES YES YES 


Add 1 spoonful Drink ! Grn 


Fig. 1.2 Tea for two 
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Flowcharts have been used for a long time as an aid in the specification of 
programs. You will have seen that decisions are represented by diamond-shaped 
boxes and actions are put in rectangular boxes. A decision box has more than one 
exit. One is chosen depending on the outcome of the test it contains, so the exit 
lines are labelled. An action box must only have a single exit. Actually there are 
dozens of other shapes each with its own special meaning. But we will spare you 
the trouble of learning them because flowcharts are obsolete already, at least 
for specifying processes. (Better methods are discussed in Chapter 12.) 

Nevertheless flowcharts have visual appeal; and while they are out of date 
for programs they have been given a new lease of life, in the modified form of 
syntax diagrams, for specifying the rules of programming languages. We will come 
to syntax diagrams in Section 2.2. Meanwhile, though you will not find many 
more flowcharts in this book, by all means use them if they help you. 

If you have been taught to use flowcharts remember that their besetting sin is 
a tendency to sprout luxuriantly in all directions. A flowchart that gives the 
appearance of a tangled thicket is worse than useless. Ideally a flowchart should 
fit on one A4 sheet. If it sprawls out with cross-references backwards and forwards 
to dozens of other pages something has gone wrong. To help prevent this, two 
additional symbols can be useful (Fig. 1.3). 


Predefined 


Undefined 
process 
ABC 


process 
XYZ 


Figure 1.3. Extra flowchart symbols 


The ‘thought bubble’ on the left was introduced by the Open University. It is 
unofficial in the sense that no standards institute has approved it, but worth using 
because it gives you a chance to avoid clutter. You can admit ignorance: the fiddly 
details of how exactly process ABC will work are postponed until you are ready to 
deal with that aspect of the problem. Sooner or later you will have to face up to it, 
but at the moment it is a distraction. This allows application of the programmer’s 
favourite strategy: divide and conquer. If a problem is too big to tackle in one fell 
swoop, split it into smaller pieces; and if they are still too big to be manageable, 
repeat the splitting until the pieces are manageable. 

The box with lined edges on the right is a variation on the same theme: 
subprocess XYZ has been defined elsewhere (presumably by a separate flowchart), 
therefore it need not be re-stated. This corresponds to a named ‘procedure’ in 
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Pascal. Procedures are discussed in Chapter 7. The difference between the two kinds 
of box is largely a matter of timing. The undefined process must become a defined 
process in due course if the whole thing is to work. By introducing a time 
dimension we recognize the fact that our understanding of a problem evolves 
dynamically. Probably one reason why flowcharts have fallen into disrepute is their 
rigidity. The conventional symbolism makes no provision at all for evolution or 
growth. So what usually happens in organizations where programmers are required 
to use flowcharts is that the program is written first, then illustrated by a flowchart 
afterwards. Not much use for a method of program design! 

Our tea-making example could have been made tidier if we had encapsulated 
the kettle-boiling process in its own flowchart (Fig. 1.4). This introduces the idea 
that programs have levels, which is one of the most important concepts for a 
beginner to grasp. At the top level actions are stated in very condensed and thus 
very general terms, at lower and lower levels things must be stated in more and 
more detail. The lowest level corresponds to actual computer instructions. Thus a 
complex program typically has a hierarchic structure. We will return to this point 
in Chapter 12. 

Finally note that flowcharts specify actions and decisions but say nothing about 
the data involved. This is a serious omission. If you draw a flowchart, you should 


YES <> NO 


and switch 
on 


Kettle 
Yo 


Figure 1.4 Boiling the kettle 
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write in beside the diagram at least the following details: (1) what information is 
used or needed by the process (imports); (2) what information is altered or generated 
by the process (exports). Only then can the specification be considered complete. 


1.4 Errors — the calm approach 


You will have to face the fact that your programs will not work correctly first 
time, at least in the early days. But you should not be complacent about this: to 
err is human, to generate 67 error messages with a 38-line program is pig-headed. 
(Yes, it can be done!) If something like that happens to you it is time to step back 
and think again. You have overreached yourself. You are not programming, you 
are throwing a jumble of statements at a computer and hoping for a miracle. 

Avoiding such embarassing situations is largely a matter of psychology. It 
matters far more that you should remain calm and unflappable than that you 
should be ultra-clever. When things go wrong, as they undoubtedly will to begin 
with, take a long cool look at what you are trying to do. Do not rush at the 
terminal blindly like a bull at a red rag. 

Many programmers respond quite angrily to messages such as 


INVALID SYNTAX IN LINE 200 
or 
DIVISION BY ZERO IN PROCEDURE AVSCORE 


and take them as a personal insult. Some misplaced sense of pride makes them feel 
they must bludgeon the compiler into submission for its insolence. Males seem 
more prone to react this way than females. Somehow women keep a better 
perspective on such matters. 

Consider the sad fate of Mr Macho Machinecoder. You will see him in any 
large computer installation, glued to the terminal hour after hour, eyes narrowed, 
teeth gritted, utterly absorbed in the struggle for mastery. When the night porter 
comes round at 2 a.m. to enquire when he is planning to leave he replies curtly 
that he has ‘just one more little bug’ to fix. He is hooked. Trying to get him off 
the terminal is like trying to prise a limpet off a rock: he will plead, he will cajole, 
he will threaten — anything to hang on to his precious terminal. 

At last, exhausted but triumphant, he sees the program run to completion. 
Honour is satisfied. He goes home to collapse into bed. Only when he awakes next 
day at lunchtime will he notice (if he ever does) that the answers are all wrong. If 
you ask him what he was doing last night he will say ‘debugging’; but in fact he 
was engaged in a misguided attempt to prove to himself and the machine that he 
was boss. 

Yet who is the boss? Not only has he enslaved himself to the machine for a 
whole night, but because he never stepped back to view the problem dispassionately 
he is condemned to repeat the process endlessly as new bugs arise from his 
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ill-considered ‘fixes’ and ‘patches’. In this book ‘debugging’ is a dirty word. We will 
say more about preventing and detecting errors in Chapter 12. Until you know 
some more about Pascal that advice will not mean a great deal. In the meantime, 
take a tip from a battle-scarred veteran. Avoid panic at all costs. When something 
goes wrong which is not immediately obvious there are only two golden rules: 


(1) Get an up-to-date listing (printout) of your program; 
(2) Go away from the computer and think about it. 


It is no good thrashing about in the hope that something will turn up. You 
have to understand what has gone wrong. 

From the very beginning you should resolve to have the right attitude; without 
it, all the technical expertise’in the world is in vain. 


2 


A preview of Pascal 


It is time to get down to business. 


2.1  Asimple example [BIRTHDAY ] 


Below is a Pascal program which, given a date, calculates the day of the week on 
which it falls. It works from 1582, when the Gregorian calendar was introduced, 
till 4903, when it will be one whole day out. 

Do not worry if, at first reading, you cannot figure it out. In this chapter we have 
to expose you to a little bit of everything (some input/output, some arithmetic, 
some discussion of data types and so on) just to get started. All these topics will be 
treated in fuller detail in subsequent chapters, when you will be expected to 
understand them. For the moment it is sufficient that you gain an overall idea of 
what Pascal looks like in action. 


program birthday; 
(* calculates day of week given birthdate by zeller’s congruence *) 


const thisyear = 1982; 


type days = 1..31; 
monthly = 1..12; 


var d : days; 
m: monthly; 
year : 0..9999; 
ok : boolean; 
cent,dday,z : integer; 


begin (% main program *) 
(* initial message *) 
writeln(’please give your birthday as three numbers:’); 
writeln(’for example, 2nd october 1948 as 2 10 1948’); 
writeln(‘when you have finished give a date before 1582 or after 4902’); 
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repeat (* main loop *) 
write(‘date of birth? ’); 
read(d,m,year); 
ok := (year > 1581) and (year < 4903); 
if ok 
then 
begin (%* use zeller’s congruence formula *) 
if year <= thisyear 


then 
write(‘you were born ona’) 
else 
write(’you will be born on a’); 
if m<3 
then 
m := m+10 
else 
m := m—2; 
(% months numbered from 1=march to 12=february *) 
if m>10 
then 


year := year — 1; 

(% jan and feb considered to belong to previous year *) 
cent := year div 100; (% century *) 
year := year mod 100; (year number in century *) 
Z := trunc(2.6%m — 0.2); — (* magic formula +) 
dday := z+d + year + year div 4 + cent div 4 — 2cent:; 
dday := (dday + 777) mod 7; 

(* large multiple of 7 added to ensure nonnegative value *) 
case dday of 

0: writeln(‘sunday’); 


1: writeln(’‘monday’); 
2:  writeln(‘tuesday’); 
3: writeln(’‘wednesday’); 
4: writeln(’thursday’); 
5: writeln(‘friday’); 
6: writeln(’saturday’) 
end; (of case *) 
end 
else 
writeln(’you should live so long!’); 
until not ok; 


writeln(‘have a nice day!’); 


end. 
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The program uses a method known as Zeller’s congruence which states that the 
day of the week (DDAY) for any date in our era may be computed from the 
expression 


dday = trunc(2.6*M—0.2)+d+y+trunc(y/4)+trunc(c/4)—2%c mod 7 


where D is the day of the month, M the number of the month, Y the year in the 
century (0 to 99) and C the century number (Lee et al., 1978). 

By TRUNC(X) we mean that X is evaluated and any fractional part ignored — 
for example TRUNC(3.75) equals TRUNC(3.25) which equals 3. By Y/4 we mean 
Y divided by 4, and by X MOD Y we mean the remainder, or modulus, after 
dividing X by Y. Thus 22 MOD 7 = 1. The result DDAY will be from 0 to 6 and is 
interpreted such that 0 is equivalent to Sunday, 1 equivalent to Monday . .. and 6 
equivalent to Saturday. 

Here is the output produced by a run of this program when it was executed on a 
DEC System-10, a large mainframe computer. 


PLEASE GIVE YOUR BIRTHDAY AS THREE NUMBERS: 
FOR EXAMPLE, 2ND OCTOBER 1948 AS 2 10 1948 
WHEN YOU HAVE FINISHED GIVE A DATE BEFORE 1582 OR AFTER 4902 
DATE OF BIRTH? 2 10 1948 

YOU WERE BORN ON A SATURDAY 

DATE OF BIRTH? 30 9 1975 

YOU WERE BORN ON A TUESDAY 

DATE OF BIRTH? 10 1 1954 

YOU WERE BORN ON A SUNDAY 

DATE OF BIRTH? 1 1 1900 

YOU WERE BORN ON A MONDAY 

DATE OF BIRTH? 29 2 1964 

YOU WERE BORN ON A SATURDAY 

DATE OF BIRTH? 7 10 1956 

YOU WERE BORN ON A SUNDAY 

DATE OF BIRTH? 25 12 1999 

YOU WILL BE BORN ON A SATURDAY 

DATE OF BIRTH? 1 1 1 

YOU SHOULD LIVE SO LONG! 

HAVE ANICE DAY! 


Note first of all that this is an example of interactive computer use. The user 
types something at the keyboard, and the computer responds by printing its reply. 
Most large computers (including the DEC System-10) and all microcomputers can 
work in this manner nowadays. 

This is a case of a program built around a ‘magic formula’ taken from a book. 
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z := trunc(2.6*M—0.2); 
dday := z+d + year + year div 4 + cent div 4 — 2%cent; 
dday := (dday + 777) mod 7; 


is tucked away in three lines of a 66-line program, and could have been squeezed 
into one line if we had wanted. There is much more to writing a program than 
knowing how to calculate the answer. 


2.1.1 Behind the scenes 


A program like this, though small by the standards of ‘real life’ computing, is liable 
to give the novice programmer conceptual indigestion when presented as a fait 
accompli. But programs do not spring ready-made like Athena from Zeus’s 
forehead; so I will attempt to give a slow-motion action-replay of the thought 
processes that led to its creation. This should dispel the aura of mystery and show 
that program design is not a conjuring trick, merely the repeated application of a 
few simple principles. 

I began — let us be honest about it — with the need for an interesting but not too 
complex example to put in my book. While reading about something else my eye 
was caught by a description of Zeller’s congruence method. It fitted the bill because 
a computer could do it easily but a person would take longer to work out the 
answer than it was worth — and that person, speaking for myself, would probably 
make a mistake in the process. So I arrived at the following ground plan. 


program birthday: 
(% declare constants, datatypes and variables +) 
begin 

(% give user some instructions *) 


repeat 

(% obtain the date in question *) 

(* transform to suitable form +) 

(* apply zeller’s congruence +) 

(* print the corresponding week-day *) 
until not ok; = (* no more to do *) 


end. 


This skeleton is nearly, but not quite, a valid Pascal program. It is invalid because 
the variable OK which will be used to indicate that the given date cannot be 
processed, has not been declared. Even if OK had been declared, the program 
would still do nothing. That does not, however, mean that it is useless, far from it. 
It is a framework on which to build. 
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Let us begin by considering what ‘(#’ and ‘*)’ are for. These are the Pascal 
comment symbols. Everything written between (* and the next *) is treated as 
commentary and ignored by the Pascal compiler. In other words it represents a 
long-winded way of doing absolutely nothing. All comments can be removed from a 
Pascal program without affecting its performance. Why then do I maintain that this 
feature is one of the most important in the language? 

There are two reasons: firstly, comments, judiciously worded, make a program 
easier to read and understand; secondly, and more important, they are indispensible 
to the program development process. During program-writing they correspond to 
the Open University’s thought-bubble symbol described in Section 1.3. They 
permit the application of ‘stepwise refinement’ in which an initially skeletal program 
structure is fleshed out in greater and greater detail. (The Pascal standard actually 
designates the curly braces ‘}’ and ‘}’ to enclose comments, but many printers do 
not have these characters so (* and *) are recognized alternatives.) A comment may 
be placed anywhere in a program where a blank space is permitted. 

Let us briefly consider the other features of Pascal introduced so far. 

The outline begins with 


program birthday; 


which identifies the program by giving it a name, BIRTHDAY. The semicolon (;) is 
important too. It is used to separate statements. We will explain the rules of its use 
in Section 2.3. 

Next will come the declarations of data to be used by the program. We have 
postponed a decision on this until we know what will be required, i.e. till after 
the processing is specified. 

The processing lies between the first BEGIN and the last END. Note that the 
final END is always followed by a full stop (period) to terminate the program text. 

Finally, everything between REPEAT and UNTIL is performed repeatedly until 
the ‘condition’ after UNTIL is satisfied. The condition is the expression between 
UNTIL and the semicolon which must be true or false, in this case NOT OK. 

We can now proceed with the second phase of our stepwise refinement, leaving 
the data declarations till last. 

The comment 


(% give user some instructions *) 
is replaced by 


writeln(‘please give your birthday as three numbers:’); 
writeln(’for example, 2nd october 1948 as 2 10 1948’); 
writeln(‘when you have finished give a date before 1582 or after 4902’); 


which should be enough to tell the user what the computer expects. For the time 
being all you need to know is that WRITELN(‘X YZ’) will cause XYZ to appear on 
the default output channel (normally the user’s terminal) immediately followed by 
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a new line, while WRITE(‘ABC’) will cause ABC to be printed without moving to a 


new line. 
For 


(* obtain the date in question *) 
we substitute 


write(‘date of birth? ’): 
read(d,m,year); 


which will print 
DATE OF BIRTH? 


on the terminal and then wait for the user to supply three numbers, separated by 
one or more spaces or new lines, for D, M and YEAR respectively. D, M and YEAR 
are called variables since their values can change during a computation (see Section 
3.3). 

We now come to an oversight in the original design: there is no provision for 
handling bad input. When a program reads data from a user at a console there is 
always the possibility of a mistake. In this case we want particularly to ensure that 
the year is within the range for which a meaningful answer can be given. So we 
perform the assignment 


ok := (year > 1581) and (year < 4903) 


and make the rest of the computation conditional on OK, which will be either 
TRUE or FALSE. Our revised schema for the main cycle is 


repeat 
write(‘date of birth? ’); 
read(d,m,year); 
ok := (year > 1581) and year < 4903); 
if ok 
then 
(% transform to suitable form +) 
(* apply zeller’s congruence *) 
(* print the corresponding weekday +) 
else 
writeln(’you should live so long!’); 
until not ok; 


which gives some measure of protection against spurious results. 

I put in this slight revision not to make things more confusing but to reflect the 
history of the program’s creation more faithfully. Even with a short program like 
this one often has second thoughts. 

Notice that, although the user will not be able to give a day number outside the 
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range 1 to 31 or amonth number outside 1 to 12 because of Pascal’s type-checking, 
the program is still not entirely foolproof: dates such as 31 9 1975 or 30 2 1980 
will still be accepted. 

The description 


(% transform to suitable form *) 


is turned into Pascal statements that adjust the month number so that months run 
from March (1) to February (12) and alter YEAR so that January and February are 
considered part of YEAR-1. 

Look back now at the program and see if you can see for yourself how 


(% apply zeller’s congruence *) 
(% print the corresponding weekday *) 


were expanded. You should be aware that DIV is the integer division operator. 
YEAR DIV 4 has the same effect as TRUNC(YEAR/4) — i.e. the remainder is lost. 
MOD, as already mentioned, gives the remainder so that if YEAR starts as 1999 
the statements 


cent := year div 100; 
year := year mod 100; 


leave CENT with the value 19 and YEAR with 99. You also need to know that the 
CASE statement selects one of many alternatives. In the example program the 
selector is DDAY so that if DDAY=S the single statement labelled with 5: is 
executed (and none of the others between CASE and END). When DDAY=5, for 
instance, the statement 


writeln(‘friday’); 


is the only one chosen. 
All that is left is to turn 


(* declare constants, datatypes and variables *) 


into valid Pascal declarations. 
The declaration 


const thisyear = 1982; 


introduces a symbolic constant. The name THISYEAR will have a fixed value of 
1982. The reason I did not just write 1982 in the program wherever THISYEAR 
appears is to make it obvious what change would be needed to run the program in, 
say, 1984. Also, if THISYEAR had been used in many places, only one change, 
such as 


const thisyear = 1984; 


would be enough to change all occurrences. 
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The declaration of new data types as in 


type days = 1..31; 
monthly = 1..12; 


is a particularly interesting feature of Pascal. The types DAYS and MONTHLY now 
become ‘subranges’ of the integers. Legal values for variables of these types are 
restricted to lie within the subrange specified. This is most valuable for error 
prevention. Other typing facilities are described in Section 3.4. 

Finally, 


var d : days; 
m : monthly; 
year : 0..9999; 
ok : boolean; 
cent,dday,z : integer; 


sets up the required variables. YEAR will be able to hold values from 0 to 9999. 
OK is a BOOLEAN variable and can have one of the values TRUE or FALSE. 
CENT, DDAY and Z are integers: they can take any whole-number values that the 
computer is capable of representing, positive or negative. 

The fact that a Pascal programmer can declare new types and can restrict 
variables to having only values that are likely to be reasonable for the problem in 
hand contributes greatly to program robustness. Many older languages (e.g. Basic 
and Fortran) do not offer this ability. Yet it is seldom, for example, that the full 
range of integers is really needed, and by reducing the number of allowable values 
the chances of an error going undetected are reduced. In this program a 32nd day 
or a 13th month would be rejected at once. 

I have elaborated on this example at some length because I believe that sooner 
or later (preferably sooner) the would-be programmer must encounter some valid 
programs. You cannot learn to swim without getting your feet wet. I also wanted to 
demonstrate that programs are not made by black magic but by the methodical 
application of commonsense principles. 


2.2 Syntax diagrams 


You have seen an example of Pascal in use. Now we define some of its rules. To 
define the grammar of the language we employ the method popularized by Wirth — 
‘syntax diagrams’ (Wirth, 1973). A syntax diagram is like a flowchart except that it 
describes data format rather than processing actions. 

Figure 2.1 is a set of syntax diagrams outlining the rules for a subset of American 
male names. There are two kinds of boxes, round and rectangular. The round boxes 
enclose terminal symbols such as ‘Richard’. These stand for themselves and appear 
as they are. The square or rectangular boxes refer to subsidiary syntax diagrams 


Name 


Title 


ra Oo & 


DEDEDE 


3ED CDEDEIED 
Surname 


CT 


22 OPO ERE 


Extra bit 


Forename 


Figure 2.1 Sample syntax diagrams 
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defined elsewhere. Thus ‘Forename’ appears as a constituent of ‘Style’ and has its 
own definition, in terms of another syntax diagram. 

To generate an allowable construction from a syntax diagram you follow the 
arrows from box to box until you reach the exit. Where there is a branch you can 
take either fork. Where there is a reference to another syntax diagram you enter it, 
follow it through, and then return to where you left off in the original diagram. We 
will employ syntax diagrams to clarify the rules of Pascal in this book from now on. 
A complete description of Pascal syntax is given in Appendix D. 

According to the above example the following names are valid: 


President John F Kennedy 
Mr RM Nixon III 


Professor George George George George Washington 
Mr Henry Ford 


The following names could not have been produced from the diagrams in Fig. 2.1. 


Mr Ford 

Abraham Lincoln 
President H Kissinger 
Ronald Reagan 


You should try to understand in what respect each is invalid before reading on. 

Mr Ford is invalid because Style includes at least one Forename or Initial; 
Abraham Lincoln is badly formed because it has no Title; President H Kissinger is 
invalid because H is not one of the instances of Initial; and Ronald Reagan is wrong 
in almost every respect. 

Just to ensure you have grasped all this let us construct one more name from the 
syntax diagrams. 

We enter Name and come across Title. So we enter Title and take a right fork, 
giving 

Dr 


as output. Having completed Title we return to Name and move on to Style. 
Entering Style we branch right and hit Forename, so we enter Forename and, to cut 
a long story short, pick 


John 


and return to Style. Passing along from Forename we fork left and thus dive into 
Initial. Here we choose 


J 


and exit. Now we have finished Style and can return to Name where we proceed to 
the box marked Surname. In Surname we produce 


Johnson 
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and return to Name. Leaving the Surname box we bear left and go through Extra 
Bit, which yields 


Jnr 
before finishing Name altogether. We have thus created 
Dr John J Johnson Jnr 


as a result of our travels. (Note that though Dr J J Johnson and Dr John John 
Johnson are permitted Dr J John Johnson is not.) 


2.3. Program structure 


Now we can use a syntax diagram to exhibit the structure of a Pascal program in an 
unambiguous way (Fig. 2.2). We are not yet ready to define Declaration or 
Statement yet. Declarations are covered more fully in Chapter 3 and statements 
beginning in Chapter 4. However we can define Identifier as shown in Fig. 2.3 
which states that an identifier is a letter optionally followed by any number of 
letters or digits. (Actually Pascal does not guarantee to distinguish between long 
identifiers that are the same in their first eight characters, so we will accept eight 
characters as a maximum length for identifiers in this book.) 


as = ¢ 
Op }e 


[recone] Le oO 


Figure 2.2. Program syntax 


What do these diagrams mean? 

They state that a program has a name, possibly followed by a bracketted list of 
identifiers (which in fact designate files) followed by a semicolon. This is the 
program heading. Next come the declarations, which are for introducing data names 
and so on, followed by a BEGIN, one or more statements separated by semicolons, 
and an END. (The full stop is part of the definition.) 


A PREVIEW OF PASCAL = 27 


Figure 2.3 Identifier syntax 


The declarations define the data characteristics; the statements accomplish the 
processing. 


2.3.1 Reserved words and other symbols 


A Pascal program is composed of characters. The letters, digits and most of the 
punctuation marks need no further explanation, but there are also various strings 
of characters designated as single entities. These letter sequences are sometimes 
called ‘reserved words’ — reserved in that the programmer is not free to re-define 
their meanings. 

For example, though BEGIN consists of five letters and END contains three they 
are treated as units by the Pascal compiler. A complete list of reserved words 
appears in Appendix B. 

Certain other compound symbols such as 


=e << S= SS (#8) 


are also treated as indivisible items. Blank spaces are not permitted between the 
characters of a compound symbol or reserved word. 

In most other respects the format of a Pascal program is very flexible. The end 
of a line does not terminate a statement. Pascal statements are separated by 
semicolons. Consequently a line may contain several statements, or one statement 
may extend over several lines. Blank spaces and new lines may be inserted to make 
the program look pleasing to the eye, and to show its logical structure on the page. 
In this book we follow a consistent pattern of indentation which shows how the 
statement groups are related. This freedom from rigid card or margin boundaries is 
a welcome improvement over, for instance, Cobol and Fortran. 


2.4 A word in edgeways 


Among the tribulations of the novice programmer, which make learning a 
programming language seem child’s play in comparison (it is!) are: learning to type, 
learning to use an editor program, finding your way about the system. 

Keyboards vary quite a lot in the placement of punctuation and special 
characters, though they almost all stick to the traditional QWERTY layout for the 
alphabet — which, incidentally, was devised to slow down typists in the 1880s when 
the machinery could not keep pace with nimble fingers. Until the day of widespread 
voice input we must all put up with it. 
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Editors are a necessary evil because they let you create (and amend) a file of 
text containing your program, which can then be presented to the compiler for 
translation into machine instructions; but the less said about some of them, the 
better. 

If you use a microcomputer it will undoubtedly have quirks of its own (e.g. you 
may find a program will run perfectly from drive A but fail to load from drive B). 
If you use a large computer you will have to go through various procedures before 
you can even become an authorized user, and then there is the problem of ‘logging 
in’ — not a trivial task on some of today’s systems. 

All these mundane difficulties just make it slightly harder to get useful results 
from the machine. You may be put off by them, especially if you had an image of 
programmers as creatures of pure thought who would lock themselves in a darkened 
room to indulge in abstruse mental gymnastics before bursting out with a cry of 
‘Eureka!’. 

This book cannot help you much with such practicalities. They are too various. 
But do not despair. Eventually we all learn to type at an acceptable rate, even if 
only with two fingers and the occasional thumb. It is not beyond the wit of ordinary 
mortals to master the intricacies of most text-editors either (though they often 
seem designed to make it so). And you can even learn to live with a micro that only 
likes its floppy discs ‘sunny side up’ or a mainframe that forces you to go through a 
complicated ritual in order to ‘log in’. 

Perseverence is essential. Once you have surmounted these barriers you will find 
that the computer is useful after all. 


3 


Declarations and 


types 


As shown in Fig. 2.2 a Pascal program is divided into two main parts, declarations 
and statements. The declarations come first, and they tell the compiler what sort of 
data is being used. We deal with them in this chapter. The statements come after 
the declarations: they specify the operations performed on the data. We deal with 
them in Chapter 4, and subsequently. 


3.1. Information and its representation 


At the machine level information is encoded in terms of binary digits or bits, which 
may be either 1 or 0. 

For convenience most computers group bits into ‘bytes’ (8 bits) or ‘words’ 
(anything from 8 to 64 bits) and handle a byte or word at a time. The meaning of 
a group of bits is not fixed: it all depends on what you do with it. For instance, to 
the Zilog Z-80 microprocessor the byte 


10010000 


means ‘subtract the contents of the B register from the contents of register A and 
leave the result in A’ when interpreted as an instruction. But it could equally well 
be treated as a positive number in which case it would be equal to 144 decimal 
(ie. 27 + 2* or 128 + 16). If it were treated as a negative number in the commonly 
used two’s complement notation, however, its value would be —112. It could also 
represent a character in the ASCII code (American Standard Code for Information 
Interchange) (see Appendix A). In this case it would stand for the ‘Data Link Escape’ 
character, with even parity. We could go on and consider 10010000 as a fraction 
or a pair of BCD digits (Binary Coded Decimal) or perhaps something else, but we 
have said enough to show that a byte has no absolute value in itself. The bit pattern 
merely selects one of a set of possible alternatives. With 8 bits there are 256 
possibilities. The machine, or the programmer, assigns a different meaning to each 
of those 256 states. 
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Now while computers are built to handle bit patterns, people are not. We find 
them confusing. We would much rather deal in numbers or characters or other 
familiar symbols. One of the chief purposes of a high level programming language 
is to convert from humanly meaningful information to bit patterns and back again 
automatically. Pascal does more than most in this respect. 


3.2 Constants 


Some items of information are unchanged for the lifetime of a program. They are 
known as constants. Pascal recognizes two sorts of constant, literals and symbolic 
constants. 

A literal identifies itself. Examples of literals are the numbers 


3.14159 (approximation to 7) 
100 (one hundred) 


the characters 


‘AS (letter A) 
af (comma) 


a 


or the character string 

‘this is it’. 

A symbolic constant is a fixed value that is given a name in a constant declaration 
at the head of a program. For example, after 


const gravity = 9.81; 


the identifier GRAVITY can be used in the program to refer to 9.81. Giving names 
to constants in this way helps to make a program more readable and easier to 
amend. 

Consider a program that prints a report on an output device with 60 lines to the 
page. If the number 60 is sprinkled through it the reader is unlikely to know what 
is so special about 60. It might also be used for the number of minutes in an hour, 
the number of days credit for customers to pay their bills, and other purposes in 
the same program. But after 


const pagesize = 60; (% lines per page *) 


all is made plain. Wherever PAGESIZE appears in the program its purpose will be 
quite clear. And if the program is switched to run on a different printer with 48 or 
66 lines to the page only one, obvious, change need be made — without redefining 
the length of an hour or altering customers’ credit terms! 

The syntax for constants is shown in Fig. 3.1. Numeric literals that have a 
decimal point or an E, or both, are considered to be of type REAL. Numbers 
that have no point or E are INTEGER constants (see Section 3.4). Some readers 
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constant - 


Constant declaration 


Constant 
constant 
ft identifier 
(-) unsigned 
number 
@ @ 
[apostrophe] [apostrophe] 


Figure 3.1 Pascal constants 


will not be familiar with ‘E notation’. When a number such as 
1.99e33 


is written in Pascal the E stands for ‘times ten to the power of’. The number above, 
which incidentally is the sun’s mass in grams, is equivalent to 199 followed by 31 
zeroes, You can see that this shorthand saves a lot of typing. 

Similarly 


1e-6 


stands for 1 times 10 to the power of —6 which is 1 millionth, or 0.000001. The 
number after E is called the exponent. A positive exponent of n, in effect, shifts 
the decimal point n places to the right; a negative exponent shifts it n places to the 
left. 


3.3 Variables 


An identifier is simply a name, as we have seen, chosen by the programmer (but 
not the same as a reserved word) to identify a particular value. The value may be 
fixed, as it is for symbolic constants, but more usually it can vary. 

The concept of a variable is extremely important. Often students are told to 
visualize a variable as a box capable of holding one piece of information such as 
a number whose contents may be altered from time to time during program 
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execution. This spells out the distinction between the name of a variable and its 
value. 


A A 
before assignment after 


Notice that when a new value is given to a variable the old one is lost. 

Variables are declared in Pascal by the VAR declaration. All identifiers used in a 
Pascal program must be declared before use, with the exception of certain 
pre-defined identifiers that the system provides. The declaration of variables not 
only states that they exist, it says what type they are (see Fig. 3.2). 


Figure 3.2 Declaring variables 


3.4 The fundamental data types 


The notion of type is a way of imposing order on the chaos of bit patterns 
described in Section 3.1. Because a variable, in Pascal, has a definite type it may 
only hold values that belong to that type. 

This eliminates at a stroke a whole class of common programming errors that 
plague low level languages — such as trying to do arithmetic on characters or 
trying to interpret data as instructions. Such errors are simply impossible in Pascal 
due to the typing mechanism. 

There are four primary predefined types. 


BOOLEAN the values TRUE and FALSE 
CHAR the printable characters (letters, digits and some other signs) 
INTEGER _ the whole numbers 


REAL numbers which can have a fractional part 


There are only two BOOLEAN values, TRUE and FALSE. In Pascal these are 
pre-declared identifiers. 

The set of values in type CHAR varies somewhat according to the computer 
used. Uppercase letters and numerals are always included. Sometimes lowercase 
letters are not. Character constants appear between apostophes in a program — 
e.g. ‘A’ or ‘Z’ etc. (see also Appendix A). 
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An INTEGER is a positive or negative whole number or zero. 1 and 5 and 
—9999 are integers. 1.5 and —99.99 are not. Most computers have a limited range 
of integers that can be represented. On some small machines the range may only be 
from —32 768 to 32 767 (using 16 bits). 

REAL numbers can have a fractional part. They are also known as ‘floating 
point’ numbers. Again they are limited to a finite range, though it is usually larger 
than that of the integers — typically from 1.4E—39 to 1.7E38, positive or negative. 
0.0 is also a REAL number. It is important to realize that the precision of the 
machine is not unlimited. Often only the most significant six or seven decimal 
digits are represented. On one computer known to the author 


1.0 — 0.000000007 


gives the result 1.0. This comes as a shock to some people. Accountants, for 
instance, may get very upset if they cannot deal in thousands of millions of 
pounds correct to the nearest penny. 

There are ways of calculating to very high degrees of accuracy, but they involve 
extra work. Indeed a whole branch of numerical analysis is devoted to making the 
best of limited floating point accuracy. 


3.4.1 Enumerated types 
It is also possible for the programmer to create new types. For example 


type 
seasons = (spring,summer,autumn,winter); 
eyeshade = (black,brown,hazel,green,grey ,blue,pink); 


declares two new types called SEASONS and EYESHADE. The first has four 
possible values, and the four constants are named by the identifiers in brackets. 
The second has seven possible values. Given these type declarations and the variable 
declarations 


var holiday,thistime : seasons; 
eyes : eyeshade; 


it is quite legitimate to write assignments like 


thistime := winter; 
holidays := summer; 
eyes := pink; 


and so on. 

Such types are called enumerated types because the programmer enumerates 
all the possible constants allowed. SPRING, SUMMER, HAZEL, GREEN etc. are 
called constant identifiers and behave like symbolic constants (Section 3.2). They 
cannot have new values assigned to them. Nor is it permissible to use the same 
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constant identifier in two different types, as this would lead to ambiguity. Thus 


type primate = (man,chimp,gorilla,orang gibbon baboon, macaque); 
sexrole = (man,woman); 


would be an error because MAN appears in two guises. 


3.4.2 Subrange types 


One can also declare a new type as a subrange of an existing type. There were two 
examples of this in the program in Section 2.1 


type days =1..31; 
monthly = 1.. 12; 


which were both subranges of the INTEGER type. These restricted ranges were 
useful for error detection since the days in a month must lie between 1 and 31 and 
there are only 12 months in a year. 

Subranges of other types may also be declared, as in 


type letters = ‘A’. . ‘2’; 


which is a subrange of CHAR. (On IBM equipment this declaration might include 
non-letters as well as the alphabet.) The REAL type is an exception: subranges of 
REAL are not allowed. 

We also saw 


Var 
year :0.. 9999; 


where a variable’s type was not given by a type identifier such as INTEGER or 
MONTHLY but by an explicit subrange specification. This is useful if the type 
need not be referred to by name anywhere else. 


3.4.3 Scalars 
The types we have considered so far, including enumerated and subrange types, are 
known as scalar types. This just means that their values cannot be broken down to 
anything more elementary. More complex types, known as structured types, will 
be discussed in Chapters 8, 9 and 10. 

All the scalar types in Pascal have the property of being ordered. That is to say, 
given two values of one type it is possible to decide whether one ranks greater 
than, equal to or less than the other. With REAL and INTEGER values the ordering 
is what you would expect, so that 


100 is less than 1000 
and 


—0.75 is greater than —1.75 
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and so on. But with characters and other types the ordering is less obvious. 
Fortunately ‘A’ < ‘B’< “C’<.... ‘Z’ and 0’ < ‘1’ << 2’<..... 9’ in all widely 
used character sets. (NB The character ‘1’ should not be confused with the integer 


1 or the number 1.0 .) 
BOOLEAN values behave as though the declaration 


type boolean = (false,true); 


had been given. In other words, FALSE come before TRUE. 
This is extended to programmer-defined types, so that given 


type weekday = (mon,tue,wed,thu,fri,sat,sun); 


we can say that TUE precedes WED or that SUN follows SAT. Often this is very 
useful. 
Note that if we had said instead 


type weekday = (sun,mon,tue,wed,thu,fri,sat); 


WED would still be greater than TUE but SUN would now be less than SAT. 


4 


Assignment and 
expressions 


The previous chapter introduced the concept of data declarations. In this chapter 
we moye on to consider statements, which actually cause things to happen to the 
data. 

The assignment statement is perhaps the most crucial in the whole language. Its 
syntax is deceptively simple, as shown in Fig. 4.1. On the left is a variable, on the 
right is an expression to be evaluated; separating them is the compound symbol := 
which is treated as a single entity. This statement instructs the computer to calculate 
the value of the expression on the right and replace the value of the variable on the 
left by the result. For example 


four :=2+2: 


sets the new value (or contents) of FOUR to be 4. 

The difficult part is not the variable but the expression on the right hand side. 
The full syntax for expressions is quite complicated and not very edifying (see 
Appendix D) so we will work up to it in stages, starting with simple examples. 


Figure 4.1 Assignment syntax 


4.1 Operators and operands 


Expressions are built up from operands, such as X (a variable) or 10 (a constant), 
and operators such as + (addition) or — (subtraction). The simplest kind of 
expression is a constant or variable on its own, e.g. 


1024 
atom . 


ASSIGNMENT AND EXPRESSIONS 
It is also possible to precede an unsigned constant or an identifier by a sign. Thus 


+20 
—deficit 


are expressions. 
The next step in complexity is to link operands by the arithmetic operators 


* multiplication 
/ division 

DIV quotient 
MOD _ remainder 

+ addition 

— subtraction 


to produce expressions such as 


balance + credits — payments 
22/7 

ndiv 10 

a*b—c/d. 


Note that the multiplication (+) and division (/) operators are not as in standard 
algebra for two reasons. In the first place all operators must be explicit, so that 
using AB to mean AB is not allowed (for there might be a variable called AB). 
Secondly an expression has to be written on a line so 


C 


D 


must be written as C / D. 


4.2 Precedence and brackets 
The question immediately arises: how should 
a*b—c/d 
be interpreted? Suppose A=10, B=20, C=30, D=40. If we simply go from left to 
right we get 


10*20 =200 
200 — 30 = 170 
170/40 =4.25 


but if we adopt the more usual arithmetic convention that multiplication and 
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division have priority over addition and subtraction we would get a different 
result. 


10*20 =200 
30 / 40 = 0.75 
200 — 0.75 = 199.25 


So the order of evaluation is crucial. 

To resolve this Pascal has an implied order of precedence. The operators +, /, 
DIV and MOD have higher precedence than the + and — operators. Operators of 
higher precedence are evaluated before those of lower precedence. In cases of equal 
precedence the operations are carried out from left to right. 

This is almost in accord with standard mathematical usage. Therefore in 

a*xbt+c 
the multiplication is performed before the addition. If we want the addition first we 
would have to write 

a* (b +c). 


An expression enclosed by a pair of brackets is considered to be an operand. This 
means that all operations within the brackets will be carried out, in the normal 
order, before all those outside the brackets. So brackets may be used to impose 
any desired order of evaluation. Brackets may be nested, as in 


a / (b * (c—d)) 
where the order of evaluation will be 


(1) c—d; 
(2) b * result of 1; 
(3) a/result of 2. 


You can insert brackets that are not strictly necessary, e.g. 


(a*xb)—c 
((a +b) +c) 


if you feel it makes the meaning clearer. When in doubt, put in extra brackets: the 
cost is marginal. 
There is no need to surround the operators *, /, +, — by spaces, so 


att 
could just as well be 
at 
but DIV and MOD are reserved words. So whereas 


i mod j 
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is an expression (remainder of I divided by J) 
imodj 


is an identifier (and probably not a genuine one). 
The most common error in forming expressions is a mismatch of parentheses. 
See if you can spot the deliberate mistake in 


pricetag := ((cost + handling) :*(100—markdown)/100) * 
((vatrate+100)/100) + postage); 


without a struggle. In fact there is an extra, errorcous, closing bracket after 
POSTAGE. Even the correct form is rather inscrutable. Expressions this complex 
(or more so) are better written in stages, as for instance 


discount := (100—markdown) / 100; 

vatratio := (vatrate+100) / 100; 

subtotal := (cost + handling) * discount * vatratio; 
pricetag := subtotal + postage; 


which makes it clearer what is going on, even if it requires a little more effort on 
the part of the programmer. 

If you find you do not have the same number of opening and closing brackets, 
give some thought to breaking the computation into smaller steps and holding the 
intermediate values in temporary variables. 


4.3 Types of expression 


We have been dealing with expressions of numeric type (INTEGER or REAL). We 
will consider character values in Section 4.4 and Boolean expressions in Chapter 6. 
Boolean expressions yield a result that is either TRUE or FALSE. 

Before we go on it is worth asking how we know the type of an arithmetic 
expression. To do so we have to consider both the operator and its operands, but 
the type of an expression can always be worked out before it is evaluated. 

The type of a constant is self-evident; the type of a variable is given in its 
declaration. 

The operators *, + and — give result types as follows: if either operand is REAL 
the result is REAL; if both operands are INTEGER the result is INTEGER. 

The division (/) operator always gives a REAL result, even if both operands are 
integer, 

The DIV and MOD operators must take INTEGER operands and always give 
INTEGER results. 

These rules are summarized, and extended to other types, in Appendix B. 
Notice that a subrange of INTEGER may be used where an INTEGER is expected. 
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4.3.1 Compatibility 


These rules allow us to determine the resultant type of an arithmetic expression. 
Granted that we know the type, how do we know whether it is legal to assign a 
valve of that type to a variable of another type? 

A variable of type T may be assigned a value of type t when 


(1) Tis the same type as t; or 
(2) T contains t as a subrange; or 
(3) Tis REAL and t is INTEGER. 


Notice that if R is REAL and I is INTEGER 
ris 

is permitted but 
ic=r 


is not. Why? Because the first involves no loss of precision (all integers have a REAL 
equivalent) but the second does (some REALs have no integer equivalent, e.g. 
should 3.5 be 3 or 42). 

PS Some compilers would not consider that P and Q and R in 


type cent = 0..99; 
varp:0..99; q:0..99; rr: cent; 


are of the same type! If you want to be sure that variables are identical in type 
either declare them in the same group or with exactly the same type identifier, as in 


var p,q,r : 0..99; 
or 
var p:cent; q,r: cent; 


for instance. 


4.4 Standard functions 


Pascal has a number of predefined functions for commonly needed calculations, and 
for type conversion. 


ARITHMETIC 
ABS(X) Computes absolute value of X. Result is same type as X, which 
must be INTEGER or REAL. 
ARCTAN(X)  Arctangent of X, where X is in radians. 
COS(X) Cosine of X, where X is in radians. 


EXP(X) e to the power of X. 


LN(X) 
SIN(X) 
SQR(X) 


SQRT(X) 
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Natural logarithm (base e) of X. 

Sine of X, where X is in radians. 

X squared. X is INTEGER or REAL; result is the same type as 
X. 

Square root of X. 


Note that for ARCTAN, COS, EXP, LN, SIN and SQRT the argument X may be 
INTEGER or REAL, and the result is REAL. 


CONVERSION 
CHR(D) 


ORD(C) 


ROUND(R) 
TRUNC(R) 


SEQUENCE 
PRED(N) 


SUCC(N) 


I is an INTEGER and the result is the character whose ordinal 
position in the character collating sequence is I. For example 
with the ASCII character set CHR(65) = ‘A’. 

C is a character; the result is the ordinal number of that 
character in the collating sequence. For example in ASCII 
ORD(‘A’) = 65. (Also used to give position in an enumerated 
type: see below.) 

R is a REAL value; the result is the nearest INTEGER to R. 
R is a REAL value; the result is the INTEGER obtained by 
truncating it to a whole number — i.e. by ignoring the fractional 
part. 


The predecessor of N, where N is of any ordinal type; that is 
any scalar type except REAL. PRED(N) is undefined if N is the 
first item in the enumeration. 

The successor of N, where N is of any ordinal type. SUCC(N) is 
undefined if N is the last item in the enumeration. 


The arithmetic functions can be used as terms in an arithmetic expression. For 


example. 


root := sqrt(49); 


would assign the value 7.0 to ROOT. 
The conversion functions can be used to overcome the problem of assigning a 
REAL value to an INTEGER variable. For instance, 


rate := round (dist/time); 


assigns to RATE the nearest whole number to DIST/TIME which is an expression 
that will have a REAL value (see Section 4.3). 
The sequence functions can be used to work through the values of an ordered 
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type. With INTEGER values 


pred(i) =i— 1 
succ(i) =i+ 1 


so they are seldom used. But + and — are not available on enumerated types, which 
is where PRED and SUCC are most useful. For example if 


type m = (jan, feb, mar,apr,may,jun,jul,aug,sep,oct,nov,dec); 


then the value of PRED(MAY) is APR and the result of SUCC(AUG) is SEP. 
NB The results of PRED(JAN) and SUCC(DEC) would be undefined. 

The ORD function may be applied to enumerated types as well as to characters. 
It gives the ordinal position as defined in the type declaration, counting from zero 
(not 1). Thus ORD(DEC) for the above type would be 11 and ORD(AUG) would 
equal 7. 

There is also a function ODD(]) which takes an INTEGER argument and delivers 
a BOOLEAN result — TRUE if I is odd, FALSE if it is even. There are other 
predefined Boolean functions for dealing with input and output which will be 
described in Chapter 9. 


3 
Simple 
input/output 


We cannot make use of the computer unless we can supply it with input information 
and get back the results it calculates in a legible form. 

In Chapter 9 you will see that all input and output in Pascal is based on the 
notion of a sequential file of data. To get started however, we will make use of four 
standard procedures - READ, WRITE, READLN and WRITELN — which do not 
require any knowledge of file-handling. 

We shall assume that these provide, in their simplest form, input and output via 
the user’s terminal, and hence that the user can employ them to interact with a 
program. In old-fashioned systems running in batch mode the READ instruction 
may take data from a card-reader and WRITE may produce listing on a line-printer, 
but most modern systems — including all microcomputers — assume that the 
primary channel for communication with a program is the console terminal. This 
terminal may be a visual display unit (VDU) comprising screen and keyboard or a 
teleprinter device which gives printed output (‘hard copy’). 


5.1 READ and WRITE 


Figure 5.1 is a syntax diagram for the READ statement. What it says is that the 
keyword READ is followed by a list of variables, separated by commas if there is 
more than one, in brackets. What happens when a READ statement is executed is 
that the computer pauses and waits for the user to type values on the terminal which 
will be assigned, in turn, to each of the variables in the input list. When all the 


variable 


Figure 5.1 Syntax of READ 


44 PASCAL AT WORK AND PLAY 


variables have received new values in this manner execution continues with the 
statement following READ. 
For example the statement 


read(r,i); 


would expect the user to supply two values, the first for R and the second for I. 
Suppose R is REAL and I is INTEGER, then the user, by typing, 


33.33 19 


would give R the value 33.33 and I the new value 19. 

The values typed in for numeric variables should be in the permissible form for 
constants. This means that 1E—7, for instance, is legal for a REAL number. Values 
for INTEGER variables should consist of digits only, optionally preceded by a sign. 
Numbers are separated by one or more spaces or new lines. This implies that a 
number may not contain an embedded blank space. (Most versions of Pascal allow 
you to type an integral constant as input to a REAL variable so that 10 would 
serve for 10.0 but you should check up on this point. Your system may not allow 
it.) 

For CHAR variables no single-quote marks are needed. Any character will be 
read (including a space or a quotation mark) just as typed. Leading spaces are not 
ignored when reading character variables, as they would be for numeric variables. 

The identifiers TRUE and FALSE will be recognized as input to BOOLEAN 
variables. However, other scalar types may not be. Some Pascal compilers are more 
flexible than others in this respect. We will assume from now on that, given the 
declaration 


var howhigh : (tiny,middling,tall,gigantic) ; 
the user can reply 

tall 
in response to the statement 

read(howhigh); 


and expect a sensible result — i.e. HOWHIGH := TALL. However you should check 
your software manual on this point: it might not work properly with your compiler. 

The syntax for WRITE is shown in Fig. 5.2. Notice that expressions are written, 
not just variables. What happens is that each expression in the list is evaluated and 
its value printed on the terminal. 

The format construction is defined as shown in Fig. 5.3 where the ‘integer’ may 
be any integer-valued expression. 

The first integer after the colon defines the field width. Thus 


write(a:20); 
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OT} 


Figure 5.3. Format specification 


states that the value of A must occupy 20 character positions on output. Items are 
padded on the left with blanks to achieve the required width. Another way of saying 
this is that output fields are right-justified. 

The second number in the format specification makes sense only for REAL 
values, It specifies the number of digits after the decimal point. If a REAL value is 
printed with no second integer in the format it will appear in scientific notation, 
using E and an exponent. When A is printed as below 


write (a:17:7); 


there will be 7 digits after the point and 8 before it (not 17 before it). One position 
will be taken up by the decimal point itself and one more by the sign ‘—’ if A is 
negative, otherwise a space. Altogether 17 characters will be printed. Remember 
that the first number indicates total width, for REAL as for other types, not the 
number of positions prior to the decimal place. For eXample, if A=12.34 then the 
statements 


write(a:12:4); — write(a:15): 
would produce 
12.3400 1.234000e+10 


on output. 


If you leave out all format information your output will be laid out according 
to the default standard field widths for your system. 


NOTE 

You will not find syntax diagrams for READ and WRITE in the official textbooks. 
This is because they are treated as procedures not statements. However when you 
read Chapter 7 you will see that READ, WRITE, READLN and WRITELN are 
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peculiar in taking a variable number of parameters, which no user-defined procedure 
may do. Furthermore WRITE and WRITELN are peculiar in taking the format 
construction. So it is more natural to consider them as statements in their own 
right, as is done here. 


5.2} READLN and WRITELN 


READLN and WRITELN are like READ and WRITE except that they do not 
ignore the end of lines. 
READ just treats the end of a line of input as a separator equivalent to a space. 


So 
read (i,j); read(k,I); 


would have the same effect whether the data were typed 


12 3 4 
or 

1 2 

3. 4 
whereas 


readin(i,j); | readin(k,!); 


would not. 

In brief, READLN always moves to the start of a new line after finishing its 
input. At an interactive terminal this means that it will wait until the user presses 
the Carriage Return key before proceeding. After a READLN any subsequent input 
will be taken from the start of a fresh line. Thus, given the input 


10 20 
40 


the statement READLN(I1,I2) would set I1 to 10 and I2 to 20; but the statements 
readin(i1); readin(i2); 


would set I1 to 10 and [2 to 40, because the first READLN would advance to a 
new line after getting the information it required. 
WRITELN causes a new line to be fed after completing its output. So whereas 


write(a); — write(b); 
would put A’s and B’s values on the same line adjacent to one another 
writeln(a); — writeln(b); 


would evaluate and print A on the current line then move to the beginning of the 
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next line to print B’s value. After doing that it would move on to the following line 
so that the next item to be printed would be underneath B’s value. 

In addition, READLN and WRITELN can be used with no arguments at all. 
READLN on its own is used to ignore the rest of the current input line. WRITELN 
on its own is used simply in order to start a fresh line of output. On most terminals 
the practical difference is that while 


write(a,b,c,d); 
prints four values on the current output line, 
writeln(a,b,c,d); 


does this and also appends the Carriage-Return/Line-Feed characters to the end. 
Having covered basic I/O (Input and Output) we are at last in a position to write 
a very simple program. 


5.3. Example program [INFLATER] 


The key to understanding Pascal is practice. One example is worth many pages of 
explanation, and one interactive session on the computer is worth many book 
examples. So you are recommended to try out the following simple program on 
your computer. 


5.3.1 The plan 


We start with the formula for compound interest. 
a =a*(1+r/100) 4n 


This means that an amount invested of A yields a return A’ after N years at R 
percent interest. I have used the sign ‘4’ to refer to raising to a power. 

This formula becomes more compelling if we use it in connection with inflation 
rather than investment. At the time of writing the inflation rate in Britain was over 
15%. Many people do not realize just how phenomenally high, historically speaking, 
that is. Nor do they realize that, if a similar rate is continued, we will all be 
millionaires (on paper) early next century. To bring it home with some immediacy 
we want to write a Pascal program that will enable us to answer questions such as 
‘what will a 20-penny ice cream cost in 25 years’ time given annual inflation of 
15%?’. The answer is rather startling — £6.58. 

Our overall design is a simple straight-line flowchart (Fig. 5.4). 

Box 1 would often be forgotten; but it is part of the art of programming. To 
make the program useful it must be usable. Therefore it must tell its user what it 
expects. Even if no one but you uses your programs you will need reminders later 
concerning the input that is wanted. 
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1. 
Give user 
instructions 


2. 
Read: Amount, 
Rate, Years 


3. 
Calculate 
final value 


Figure 5.4 Inflation program 


We can write something like the following: 


writeln(‘this program shows the effect of inflation.’); 
writeln; 

writeln(‘please give 3 numbers:’); 

writeln(‘initial amount, inflation rate (%) and years.’); 
writeln(‘separate the numbers by spaces.’); 

writeln; 


Having completed the instructions, we can proceed to box 2. 


write(’your 3 numbers are ? '); 
read (oldvalue, rate, time); 
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We have used WRITE and READ so that the question and its answer will appear on 
the same line. 
Box 3 causes a difficulty. We want to write 


newvalue := oldvalue * (1 + rate / 100) “ time; 


where “ signifies raising to the power, but Pascal has no such operator. We will have 
to resort to logarithms, making use of the fact that EXP is the inverse function 
(antilog) of LN. Thus EXP( LN(A) * B )= A 4 B, A to the power of B. We also 
introduce a temporary variable to make the calculation clearer. 


gain := exp(In(1 + rate/100) * time); 
newvalue := oldvalue * gain; 


Now we can print the result. We could just say 
writeln(‘final amount is ’, newvalue:12:2); 
but it is much more polite to remind our user what the initial conditions were. 


writein(‘after ‘,time:7, ’ years at ’, rate:8:2, '%’); 
writeln(’a starting sum of ’, oldvalue:12:2, ’ becomes ’, newvalue:12:2); 


Observe that you can use character strings, enclosed within single quotation marks 
(apostrophes), as elements of an output list. By interspersing messages among your 
results they can be made more meaningful. 

Mention of politeness is not at all out of place in a programming text. We put a 
‘please’ in the initial instructions for this reason. Remember that your program will 
be your representative in a dialogue with somebody, and treat that person (even if 
it is yourself) with the respect people deserve. As programmer you have a 
responsibility not to let loose an ill-mannered program on the unsuspecting public. 

Once you accept that in writing an interactive program you are writing the script 
for a conversation that has yet to take place you will never again be tempted to 
sacrifice common courtesy to save a few lines of typing. 


5.3.2 The program 


All that remains now is to write out the data declarations for each variable we have 
employed and to put everything together in the right order. 


program inflater; 
(%* program to demonstrate the effect of inflation *) 


var time : 0..9999; 
oldvalue, newvalue, rate, gain : real; 
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begin 
(instructions *) 
writeln(‘this program shows the effect of inflation.’); 
writeln; 
writeln(’please give three numbers:’); 
writeln(‘initial amount, inflation rate (%), and years.’); 
writeln(’separate the numbers by spaces.’); 
writeln; 


(* user input *) 
write(’your 3 numbers are ?’); read(oldvalue,rate,time); 


(* main calculation +) 
gain := exp(In(1 + rate/100) * time); 
newvalue := oldvalue * gain; 


(% output of results *) 

writeln; 

writeln(‘after ‘,time:7,’ years at ’,rate:8:2,'%’); 

writeln(’a starting sum of ’,oldvalue:12:2,’ becomes ’, 
newvalue: 12:2); 


end. 


5.3.3 Sample runs 
Here is a printout of what happened when this program was run a couple of times 


on the Research Machines 380Z, a microcomputer system. 
THIS PROGRAM SHOWS THE EFFECT OF INFLATION. 


PLEASE GIVE THREE NUMBERS: 
INITIAL AMOUNT, INFLATION RATE (%), AND YEARS. 
SEPARATE THE NUMBERS BY SPACES. 


YOUR 3 NUMBERS ARE ?9250.0 15.0 24 


AFTER 24YEARSAT ~~ 15.00% 
STARTING SUM OF 9250.00 BECOMES 264781.60 


THIS PROGRAM SHOWS THE EFFECT OF INFLATION. 


PLEASE GIVE THREE NUMBERS: 
INITIAL AMOUNT, INFLATION RATE (%), AND YEARS. 


SEPARATE THE NUMBERS BY SPACES. 
YOUR 3 NUMBERS ARE ?0.20 15.0 20 
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AFTER 20YEARSAT ~~ 15.00% 
STARTING SUM OF 0.20 BECOMES 3.27 


What these figures mean is that (assuming steady 15% inflation) a house that cost 
£9250 in 1976 will fetch over a quarter of a million pounds at the turn of the 
century, and an ice cream costing 20 pence in 1980 will be selling for £3.27 in the 
year 2000! The currency units do not really matter. We could equally well have said 
that today’s 20-cent ice cream will cost $3.27 at the end of the century — except 
that the US inflation rate is somewhat lower. 


5.4 Exercises 
Here are some suggestions for your homework. 


1. The number of atoms left (N’) of a radioactive element after T time units can be 
calculated from the formula for radioactive decay 


n’ = exp( —(kt) ) #n 


where N is the original amount and K is a parameter giving rate of decay for each 
element which depends on the half-life. The relation between K and the nalf-life is 


k = 0.693 /h 


where H is the half-life expressed in the same time units as T. 

Write a program using the framework of INFLATER as a guide which will accept 
an initial amount (N) and values for H and T and compute the amount remaining 
after T time units. The new amount should be printed in a legible format for the 
user. 


2. Write a program which accepts as input three integers DIST (a distance in 
metres), MINS and SECS (minutes and seconds taken to cover that distance) and 
produces as output the speed for the journey in kilometres per hour, miles per hour 
and metres per second. There are 1609.344 metres in a mile (i.e. 1 yard = 0.9144 m). 

Ideally your program should reject odd input including negative times and 
quantities of seconds greater than 60. 


3. The area of a triangle is given by the formula 
area = sqrt(s * (s—a) * (s—b) * (s—c)) 


where s=0.5 * (atb+tc). 
Write a program that reads three numbers A, B and C from the keyboard 
representing the lengths of three sides and prints the area of that triangle. Your 


program should cope with bad input (e.g. when A + B< C) without an execution 
error. 
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4. A garage charges $4.75 per complete hour or part thereof for parking. Write a 
program which will read in two times typed in military hours (e.g. 1000 for 10 a.m. 
or 1730 for 5.30 p.m.) indicating entry and exit time and work out how much to 
charge the driver. 

Hint: the time from 1000 to 1230 is not 230 minutes; it is 150. You must split 
your times into hours and minutes using the MOD and DIV operators before you 
can find the difference between them. 


5.5 Quiz 


Here is a short multiple-choice test which enables you to check your understanding 
of the concepts presented so far. 


5.5.1 The questions 


Select one of a, b, c or d as the correct answer and mark it in pencil. Do not look at 
the answers till you have made a selection for each question, even if it is a guess. 
That’s an order! 


1. The declaration 
type year = (jan, feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec) ; 


declares YEAR as 
(a) astructured type 
(b) an enumerated type 
(c) asubrange type 
(d) a fundamental type? 


2. The number 3.14159 is 
(a) asymbolic constant 
(b) an identifier 
(c) aliteral 
(d) an integer? 


3. Which of the following is not a predefined Pascal type? 


(a) BOOLEAN 
(b) CHARACTER 
(c) INTEGER 
(d) REAL 


4. The following diagram is 


(a) a Pascal program 
(b) a program schema 
(c) a flowchart 

(d) asyntax diagram? 
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(FoR ) variable (=) expression | — | 


DOWNTO 


expression ( vo } statement 


5. An instruction in Pascal that causes an action to be performed is 


(a) astatement 

(b) an imperative sentence 
(c) an expression 

(d) a declaration? 


6. Assuming R is a REAL variable, PI is a REAL constant and AREA is an 
INTEGER variable, say which one of the following statements is valid. Only one is. 


(a) area := 3.14159 * sqr(r) 
(b) pi := 3.14159 
(c) r:= (r * sqrt(r) 
(d) area := round(2 * pi * r) 
7. The statement 
writeln(‘a’:3, 2:1); 
produces which of the outputs below? 
(a) A321 
(b) A 
3 
2 
1 
(c) A2 
(d) A2.1 
8. The value of 
10*10—5/2 
is 
(a) 25 
(b) 25.0 


(c) 97.5 
(d) 47.5? 
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5.5.2 The answers 
How dare you turn the page without even attempting the quiz! What am I wasting 
all this effort on you for? Go on, do it for your own sake, please. 


1. 


(a) No. Structured types are dealt with in Chapter 8 onwards. 

(b) Yes. 

(c) No. A subrange type (for example 0..99) places limits on another type. 

(d) No. The fundamental types are BOOLEAN, CHAR, INTEGER and REAL. 


. (a) No. A symbolic constant has a name. See Chapter 3. 


(b) No. An identifier starts with a letter and may only contain letters and digits. 
(c) Yes, a literal stands for itself (literally). 
(d) No. Integers do not have decimal points. 


. (a) No. BOOLEAN is a predefined type. 


(b) Yes, CHAR is the Pascal name, not CHARACTER. 
(c) No. 
(d) No. 


. (a) Certainly not! 


(b) No. You’re guessing. Who mentioned program schemas anyway? 
(c) Not quite. Flowcharts illustrate computations, not rules of the language. 
(d) Yes, of course. 


. (a) Correct. 


(b) No. You must have learned Cobol first. 
(c) No. An expression is just a constituent part of (some) statements. 
(d) No. Declarations describe data attributes not processing activities. 


. (a) No. You cannot assign a REAL value to an INTEGER variable. 


(b) No. You cannot assign a value to a constant in this way. 

(c) No. Too many left brackets. 

(d) Yes, even though it looks like the wrong formula it is grammatically correct 
Pascal. 


. (a) No. Go back and re-read this chapter. 


(b) No. WRITELN only throws a line after all of its output. 
(c) Yes. Note the two leading spaces. 
(d) No. Think again! 


. (a) No. The / ensures a REAL, not INTEGER, result. 


(b) No. The * and / are done before the subtraction. 

(c) Yes. 

(d) No. You have not understood the precedence of operators. Read Section 
4.2 again. 


A score of four or less out of eight indicates that you are going too fast. You 


need to re-read Chapters 2 to 5. 


6 
Looping and 
grouping 


With the constructs presented in Chapters 3 to 5 we can only write simple 
straight-line programs. In other words we can only write trivial programs. 

The two concepts which give the digital computer its great power are the idea 
of a decision and the idea of a loop. A decision implies that the computer can 
choose from among alternative courses of action on the basis of data read in or 
intermediate results. A loop describes a repetitive piece of program; and the 
computer really comes into its own when required to perform a repetitive task. 
Indeed much cf the art in programming consists of describing a solution in terms 
of the repetition of a few simple operations over and over again. 


6.1 Control structures 


It is usual to classify the control structures of a program into four kinds: 


1. Sequence 
2. Selection 
3. Repetition 
4. Embedding. 


Sequence is absolutely basic. It is the normal case, and in Pascal when one 
statement is written after another that means it will be executed after the other. 
Sequence is so fundamental that we have already covered it implicitly. We say 
nothing more of simple sequence in this chapter since there are no special 
statement forms for it (although beginners find the sequential mentality quite 
hard to acquire, as the machine forces the programmer to specify the order in 
which actions are carried out even when it does not matter). 

The idea of selection in a program means that the computer can choose one 
from several possible paths depending on conditions that have arisen during the 
run. Without selection, programs would be totally inflexible. 
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A portion of program that is repeated is called a ‘loop’. Such a process has a 
characteristic flowchart as shown in Fig. 6.1. Notice that a loop has an initialization 
phase (often forgotten), an exit test and a process which is repeatedly executed — 
usually referred to as the ‘loop body’. It is important that the actions in the body 
of the loop can alter some variable used in the exit test so that after a finite number 
of repetitions the test will succeed. Otherwise you have an infinite loop, a bugbear 
of careless programmers. 


Set. 
initial 
conditions 


Perform 
process 


Figure 6.1 A loop 


Embedding refers to the containment of one statement, or many, within 
another. All programming languages provide ways of packaging up a group of 
statements or instructions to form a unit. This is most important if the program 
is to have a coherent modular structure, especially for large programs. We introduce 
embedding in this chapter, and will have more to say about it in Chapters 7 and 12. 


6.2 Selection 


Pascal offers two ways of making a choice, the IF statement and the CASE 
statement. The IF statement is simpler (see Fig. 6.2). An example is 


if a= 0 then write (‘zero’); 
and a more complex example, using ELSE, is 


ifa<O then 
write (‘negative’) 
else 
write (‘positive’); 
where there is no semicolon before ELSE, please note, as the complete IF 
statement does not end until after the ELSE clause. 
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Statement 


Condition 


Statement 


Figure 6.2 Syntax of IF 


It will be seen that IF provides a two-way choice. IF-THEN is a degenerate case 
of IF-THEN-ELSE where the second alternative is to do nothing at all. 


6.2.1 Conditions 
The condition between IF and THEN is an expression of type BOOLEAN. The 
simplest form of a BOOLEAN expression is a constant 


true 
false 


or a variable 
ok 


(assuming var ok : boolean in the declarations). 
There are also six relational operators which compare numeric or other ordered 
scalar values to give a BOOLEAN result. 


> greater than 

< less than 

= equal 

<> not equal 

>= greater than or equal to 
<= less than or equal to 


Thus the value of 1 <= 2 is TRUE and 4 > 7 is FALSE. 
Boolean expressions may be linked by the three logical operators 


not 
and 
or 


of which NOT takes one operand and the others takes two. For example 
b1 and b2 
where B1 and B2 are of type BOOLEAN is only true if both operands are true. 
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The meanings of these operators are most precisely defined by truth tables. 
(T means true, F means false.) 


R  NOTR L R  LORR L R LANDR 

T OF T T T T T T 

FOOT T OF T T OF F 
FT T F oT F 
FF F FF F 


Observe that OR is the ‘inclusive’ or. 
As with arithmetic operators, the question of precedence arises. The order of 
priority is as follows. 


not 
and 
or 


< > = <> <= >= 


In other words all relational operators are of lower precedence than all logical 
operators (and, incidentally, than all arithmetic operators). 
This can cause problems, for if A and B are integers 


a<borb=0 


is not a valid Pascal expression. It should be written 
(a <b) or (b = 0) 

to prevent OR binding as 
(a< (bor b)) =0 


which is not only senseless but wrong — because B is numeric and OR must have 
BOOLEAN values on its left and right. 


6.2.2 Nesting 

The syntax of IF states that a statement follows the keywords THEN and ELSE. 
If the statement after THEN or ELSE is another IF statement we have what is 
called a ‘nested’ IF. 


if exam > 75 then 
writeln(‘distinction’) 
else 
if exam > 49 then 
writeln (‘pass’) 
else 
failures := failures + 1; 
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Here the statement after ELSE is itself an IF statement. When IFs are nested it 
may not be obvious which ELSE belongs to which THEN. The rule is that an 
ELSE is always paired with the nearest preceding THEN. Thus 


ifn > 0 then 
if (m div n) > n then 
=m—n 


elsem :=mtn; 


will do M := M+N only if N > 0 is true and (M DIV N) > N is false. In other 
words the ELSE is part of the second IF statement, and if N< = 0 nothing will 
be done at all. (Can you see why 


if (n > 0) and ((mdiv n) >n) then 


m:=m—n 
else 
m:=mtn; 


would not do the job?) 


It is good practice to line up each ELSE with the IF to which it belongs, as 
above. The template 


IF condition THEN 
true-path 

ELSE 
false-path 


is recommended as a guide, to clarify what is intended. 
If you find yourself lost in a tangled thicket of nested IFs it probably 
indicates a bad program design: a structural re-think may be called for. 
It seems that people find a chain of conditions like 


if c1 then 
$1 
else if c2 then 
s2 
else if c3 then 
$3 
else s4 


easier to follow than a deeply nested construction like 


if c1 then 
if c2 then 
if c3 then s1 
else s2 
else s3 
else s4 
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which implies that THEN-IF causes more trouble than ELSE-IF. 
However a long chain of ELSE-IFs may indicate a multi-way choice, which 
could be handled more conveniently by the CASE statement. 


6.2.3. The CASE Statement 


The CASE statement is a special construction for multi-way decisions. Its syntax 
is outlined in Fig. 6.3. 

A CASE statement was used in program BIRTHDAY at the beginning of 
Chapter 2, to print ‘SSUNDAY’, ‘MONDAY’ etc. depending on the day number. 

The expression between CASE and OF is known as the selector. The constants 
preceding the colon in front of each statement between OF and END are called 
case labels. The selector is evaluated and then the single statement with a case 
label equal to that value is executed. None of the other statements between OF 
and END are executed. The dummy case label OTHERWISE provides for the 
situation where no case label equals the selector value. If OTHERWISE were 
omitted such a statement would cause an error, in most versions of Pascal, at 
run-time. (Some versions of Pascal lack the OTHERWISE construct.) 

The selector expression may be of any scalar type except REAL and the case 
labels must be of the same type. No case label constant may appear more than 
once in the same CASE statement. 

An example, in which the expression is simply a CHAR variable, may make 
things more concrete. 


read(ch); 

write(ch,’ is a ’,); 

case ch of 
‘a’ ‘e’‘i',/o',‘u’ : writeln(‘vowel’); 
‘w’,‘y’ : writeln(‘semi-vowel’); 


prrrswere PA ye : 
writeln(‘punctuation mark’); 
otherwise: 
if (ch >= ‘a’) and (ch < = ’z’) then 
writeln(‘consonant’) 
else writeln(‘strange character’); 


end; (of case *) 


The purpose of this little extract is to classify the character CH. 

Note that many case labels may precede each statement within the CASE 
statement: this allows a variety of values to lead to the same action. The CASE 
statement is particularly useful when selecting from various actions on the basis 
of values which do not fall into simple consecutive ranges, and where several 
values map onto one action. 
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sees 


expression 


OTHERWISE 


Figure 6.3 The CASE statement 


6.3 Repetition 


Pascal provides three statements specifically for looping. The WHILE and REPEAT 
statements are for loops where the number of repetitions is not known but a 
terminating condition can be specified. The FOR statement is for loops based on 

a counter, or loops that step through a known sequence of values. 


6.3.1 The WHILE statement 


This is actually the most general of the looping constructs in Pascal: the other two 
may be mimicked using WHILE. Its syntax is as shown in Fig. 6.4. 


aD e 


Figure 6.4 WHILE syntax 


The BOOLEAN expression after WHILE is the continuation test. What happens 
is that the statement after DO, the body of the loop, is executed again and again 
as long as the continuation test is satisfied. When it becomes false the program 
continues with the next statement after the WHILE statement. If the condition is 
false to begin with, the loop body is never executed. 

You should be able to see the difference between 


if a>0O then 
write(a);: 


and 


while a > 0 do 
write(a);: 


by now. Do not try the latter version: if A exceeds zero it will waste a lot of paper! 
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6.3.2 The REPEAT statement 
The REPEAT statement syntax is illustrated in Fig. 6.5 


statement condition 


Figure 6.5 REPEAT 


There are three main differences between this statement and WHILE: 


(1) With REPEAT the exit test is performed at the end of the loop not the 
beginning, so the body is always performed at least once; 

(2) The exit condition is fulfilled when the BOOLEAN expression is TRUE 
(as opposed to FALSE with WHILE); 

(3) There may be more than one statement between REPEAT and UNTIL 
(another example of nesting) whereas there is only one statement after 
DO forming the body of a WHILE loop. 


For example, 


writeln(‘count-down has begun... .’); 
secs := 10; (secs is an integer variable *) 
repeat 


writeln(secs:4); 

secs := secs — 1; 
until secs = 0; 
writeln(‘zero’); 
writeln (‘we have liftoff!’); 


would reel off a rocket’s blastoff sequence; while the little program below prints an 
approximation to 7 (pi). 


program pivalue; 
(%* computes pi from the series 
pi/4 = 1/1 — 1/3 + 1/5 — 1/7 + 1/9... . ) 


const tiny = 0.000002; (determines precision *) 


var 
n : integer; 
pi,term : real; 
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begin 
writeln(‘computation of pi:’); 
writeln; 


pi := 0.0; 
n:=1; (initial values *) 


repeat 
term := 4/n ~— 4/(n+2); 
pi := pi + term; 
n:=n+4; 


until term < tiny; 


writeln(‘after’,n div 4 :7,’ cycles,’); 
writeln(‘pi = ’,pi :12:7); 


end. 
The output from running this on the 380Z microcomputer is reproduced below. 


COMPUTATION OF PI: 


AFTER 501 CYCLES, 
Pl= 3.1404760 


The point of using REPEAT here is that the loop ends not after a predetermined 
number of cycles, but when TERM is small enough to be negligible. 


6.3.3. The FOR statement 


The FOR statement is designed for stepping upwards or downwards through a 
series of values one by one. This is such a common task that it has its own 
statement in Pascal (see Fig. 6.6). 

The variable after FOR is called the control variable, or sometimes the index. 


( FoR ) variable (=) expression | | 


DOWNTO 


expression ( vo ) statement 


Figure 6.6 FOR syntax 
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This variable will be given all the values from the first (the expression after the := 
symbol) to the last (the expression after TO or DOWNTO) including both these 
terminal values, in succession. For each new value of the control variable the 
statement after DO will be executed once. The word TO implies a sequence up 
from the first value to the last. The word DOWNTO indicates a descending 
sequence. 

If the first value already exceeds the last (with TO) or is already less than the 
last (with DOWNTO) the statement after DO is not executed even once. 

The ‘rocket launch’ shown in Section 6.3.2 as a REPEAT example is actually 
more naturally written as a FOR loop. 


writeln(‘count-down has begun... . ’); 

for secs := 10 downto 1 do 
writeln(secs:4); 

writeln(‘zero’); 

writeln(‘we have liftoff!’); 


In a FOR loop the number of repetitions can be determined in advance, without 
executing the loop body. It is used very frequently for counting up and down the 
integers, though other ordinal types may be used as in 


forc := ‘a’ to ‘z’do (something *) 


which steps through the characters ‘A’ to ‘Z’ inclusive. (In the ISO/ASCII 
character set these would be the 26 letters of the alphabet, though on some 
computers, notoriously IBM machines, other characters would also fall into this 
range.) 


6.4 Embedding 


We have already had examples of nesting — one IF statement nested in another, 
an IF statement forming part of a CASE statement, a WRITE statement within 
a FOR statement and so on. Pascal also allows another kind of embedding in 
which several statements may be grouped together and treated as a single unit. 

You might have thought that REPEAT was more useful than WHILE on the 
grounds that many statements may be inserted between REPEAT and UNTIL 
but only one statement may follow DO as the loop body. This might make it 
seem that a WHILE loop can only perform a very simple action. 

However, in Pascal, any series of statements sandwiched between 


begin 
and 
end 


is treated, for syntactic purposes, as one statement. BEGIN and END are sometimes 
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called statement brackets since they bracket together a group of statements into a 
single compound statement. 

BEGIN and END are commonly used to make a WHILE or FOR loop control a 
process consisting of many steps. An example follows. 


(% finding the average and maximum *) 
write(‘how many numbers (1 at least) ? ‘); 
read(n); 
read(maxi); tote := maxi; 
fori := 2tondo 
begin 
read(j); tote := tote + j; 
if j} > maxi then 
maxi :=j; (new high value *) 
end; (closes for loop *) 
writeln(‘maximum of ‘,n,’ values was ‘,maxi); 
writeln(‘average was ’,tote/n); 


Be warned that it is the easiest thing in the world to write BEGIN then a series 
of statements and then forget to close with END. This is why the indentation 
conventions followed in this book and most other Pascal texts are not a luxury. 
By lining up each END with its BEGIN and indenting the statements contained by 
them the structure of the program is made transparent. If you do not adopt this 
practice yourself, or something very like it, you do so at your own peril. It will 
be your own fault when you cannot understand the statement grouping in your 
own program. 

For lazy programmers many systems have a utility program that will take a 
Pascal program as input and lay it out neatly with BEGINs aligned with ENDs, 
IFs with ELSEs and so on. If you cannot be bothered to do that yourself, then 
at least let the computer do it for you; and if you do not have such a program, 
perhaps now is the time to start thinking about writing one — using your own 
favourite format rules. 


6.5 Semicolons 


Having mentioned layout, it is time to say a word on punctuation. Newcomers 

to Pascal often find the usage of the semicolon confusing at first. The main point 

is that the semicolon is not used to end a statement but to separate it from the next 
one. However, knowing this fact does not always help the beginner, and the 
following simple rule will avoid some common problems: 


a semicolon before ELSE is wrong; 
a semicolon before END is unnecessary. 
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Placing a semicolon just in front of ELSE effectively chops the IF statement in two, 
when it should be a single statement. Putting a semicolon before END is redundant 
because there is no following statement. But programs grow and change, so that is 
no guarantee that there will never be a following statement. This is why you will 
find that in this book there usually is a semicolon before each END — to allow for 
later insertions in a natural manner. In fact most extracts shown end in a semicolon 
implying that they are incomplete and that other statements may follow. 

Pascal handles ;; or ;END by assuming the existence of a dummy or void 
statement after the first semicolon. This will not make your programs inefficient, 
since the dummy statement does nothing. 


6.6 The dreaded GOTO 


We have dealt with constructs for decisions, loops and embedding in Pascal, but 
there is one other control construct we have not mentioned — the dreaded GOTO. 
This simple instruction has come in for more than its fair share of criticism in 
the computing literature (Dijkstra, 1968) on the grounds that its undisciplined use 
can lead to hair-raisingly messy programs. However the sparing use of GOTO does 
not make a good program into a bad one, nor does the mere elimination of GOTOs 
render a bad program good. While I recommend that you should not use GOTO 
where another kind of statement will do just as well, there is no point in making 
a fetish out of its avoidance. You must make your own judgement on its usefulness. 
And remember it is a controversial topic in computer science: on such trifles do the 
experts spend their energy! 

The GOTO is a simple jump instruction which transfers control to another part 
of the program. To use it one writes, for instance, 


goto 99; 


where 99 is a ‘label’. Labels are always unsigned integers and if a label is used as 
destination in a GOTO it must label a statement somewhere else in the program. 
To continue our example this means that either before or after the statement above 
some statement must be preceded by 99 and a colon, such as 


99: writeln(‘error exit!’); 


where the WRITELN has been labelled 99. 

One further step must be taken to make 99 a valid label: it must be declared. 
It is declared, before CONST and after the program (or procedure or function) 
heading by the label declaration whose form is shown in Fig. 6.7. This allows a 


unsigned integer 


Figure 6.7 LABEL declaration 
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list of labels to be declared. We could say 
label 1, 99; 


for example. 

Using a GOTO is something of a last resort, but it has two main uses: 

(1) as an ‘emergency exit’ to the end of a program or procedure when things 
have gone so wrong they cannot be rectified (e.g. after a fatal input error); 

(2) as a way of exit from the middle of a loop. Although WHILE permits exit 
from the front of a loop and REPEAT from the end it is sometimes useful to quit 
from the middle: some loops are meant to be executed ‘n and a half’ times. Using 
GOTO this can be handled quite naturally, as sketched below. 


repeat 
(* preliminary *) 


(* read and test data *) 
if (* bad data input *) then goto 20; (* quit *) 
(* deal with input *) 


until (%* normal exit condition *); 
20: (#go on from here *) 


PS. If you ever find yourself using GOTO to jump out of a procedure or function 
(see Chapter 7) or into the middle of a loop you have probably made a bad design 
error: that really is bad practice (and some versions of Pascal forbid it altogether). 


6.7. Example program [DECLINE] 


The program that follows uses some of the control constructs described in this 
chapter to show the depreciation of an asset over time. 

When a piece of equipment is purchased its book value is what it cost to buy. 
As time goes on it loses value until it reaches its salvage value, which is what it 
can fetch for scrap or in part-exchange when it is replaced. 

There are three main methods in use for writing off the cost of an asset over 
a period of time, none of which, incidentally, makes any adjustment for inflation. 

The straight-line method is the simplest. The asset’s value declines at a constant 
rate during the period concerned. 

The so-called ‘double declining balance’ method works by multiplying the 
current value by a fraction to produce the new value — i.e. by carrying forward 
only a fixed proportion of the current value from one year to the next — until 
the salvage value (rock bottom) is reached. The fraction is normally taken as 
2/LIFE where LIFE is the asset’s lifetime in years (assumed to be more than 2 
years). 
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The sum-of-the-years method reduces the value by the greatest amount early 
in the life of the asset and reduces it by less and less as time goes by, reflecting 
the fact that the major depreciation occurs soon after purchase. If the product 
is to last 4 years for instance it will lose 4 times as much in the first year, 3 times 
as much in the second year, and twice as much in the third year as in the last 
year, with the series of proportional losses 


4,3,2,1. 


The sum of these number is 4 + 3 + 2 + 1 = 10, so it will suffer 0.4, 0.3, 0.2, 0.1 of 
its total loss in each succeeding year. In general, for N years the sum is 


n+ (n—1) + (n—2)....=n *(nt+1)/2. 


(This is assigned to the variable called YSUM in the program.) 

Since the way an asset depreciates can have a considerable effect on a company’s 
tax liability it is not unrealistic to imagine a program such as the one that follows 
being used interactively to try out the consequences of adopting one method or 
another and of spreading the depreciation over various intervals. 


6.7.1 Program listing 


program decline; 
(* calculates and prints depreciation schedule for an asset *) 
var cost,salvage,book,loss,totaldep,rate,ysum,diff : real; 
life,year: 0. . 9999; 
mode:1..3; (#*number of method chosen *) 
begin 
(* user input *) 
write(‘cost of asset ?’); read(cost); 
write(‘salvage value ?'); read(salvage); 
write(‘life in years ?'); read(life); 
writeln(‘method of depreciation : '); 
write(‘1=straight line, 2=declining, 3=sum of years ? '); 
read(mode); 
(%* headings **) 
writeln; 
writeln(‘years’:10,‘annual’: 10, ‘total’: 10,‘book’:10); 
writeln(‘ ‘:10,‘depreciation’:20, ‘value’: 10); 
writeln; 
(% initialization +) 
book := cost; 
loss :=0; totaldep := 0; 
(% loss is depreciation this year, 
totaldep is total depreciation to date *) 
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year := 0; 
rate := 2/ life; 
ysum := life(life+1)/2; (sum of years *) 
diff := cost — salvage; 
repeat (* main loop *) 
writein(year:10,loss:10:2,totaldep: 10:2, book:10:2); 
case mode of 
1: loss := diff / life; 
2: begin 
loss := rate * book; 
if (book—loss) < salvage then 
loss := book — salvage; 
(* correction for overshoot in mode 2 only +) 
end: 
3: loss := (life—year)/ysumxdiff; 
end; (* of case *) 
book := book — loss; 
totaldep := totaldep + loss; 
year := year + 1; 
until year > life; 
writeln(‘end of job.’); 
end. 


6.7.2 Trial runs 
Three runs of the program on the DEC System-10 are shown below. 


COST OF ASSET ? 1000 

SALVAGE VALUE ? 200 

LIFE IN YEARS ? 7 

METHOD OF DEPRECIATION: 

1=STRAIGHT LINE, 2=DECLINING, 3=SUM OF YEARS ? 1 


YEAR ANNUAL TOTAL BOOK 
DEPRECIATION VALUE 

0 0.00 0.00 1000.00 

1 114.29 114.29 885.71 

2 114.29 228.57 771.43 

3 114.29 342.86 657.14 

4 114.29 457.14 542.86 

5 114.29 571.43 428.57 

6 114.29 685.71 314.29 

7 114,29 800.00 200.00 


END OF JOB. 
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COST OF ASSET ? 1000 

SALVAGE VALUE ? 200 

LIFE IN YEARS ? 7 

METHOD OF DEPRECIATION : 

1=STRAIGHT LINE, 2=DECLINING, 3=SUM OF YEARS ? 2 


YEAR ANNUAL TOTAL BOOK 
DEPRECIATION VALUE 

0 0.00 0.00 1000.00 

1 285.71 285.71 714.29 

2 204.08 489.80 510.20 

3 145.77 635.57 364.43 

4 104.12 739.69 260.31 

5 60.31 800.00 200.00 

6 0.00 800.00 200.00 

7 0.00 800.00 200.00 

END OF JOB. 


COST OF ASSET ? 1000 

SALVAGE VALUE ? 200 

LIFE IN YEARS ? 7 

METHOD OF DEPRECIATION : 

1=STRAIGHT LINE, 2=DECLINING, 3=SUM OF YEARS ? 3 


YEAR ANNUAL TOTAL BOOK 
DEPRECIATION VALUE 

0 0.00 0.00 1000.00 

1 200.00 200.00 800.00 

2 171.43 371.43 628.57 

3 142.86 514.29 485.71 

4 114.29 628.57 371.43 

5 85.71 714.29 285.71 

6 57.14 771.43 228.57 

7 28.57 800.00 200.00 

END OF JOB. 


6.8 Exercises 


The following exercises can be answered using only the material covered so far. 


1. Easter falls on the first Sunday following the first full moon on or after the 
Vernal Equinox, 21 March. The following algorithm will calculate, for a given year 
Y (such that Y > 1582 and Y < 4903), the date of Easter. 
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The variables used can be briefly described as follows: CENT is the century; 
GREG is the ‘Gregorian correction’, i.e. the number of years divisible by 4, such 
as 1800 and 1900, when a leap year was not held; GOLD is the ‘golden number’ 
which is used to determine the position of the calendar moon; C is the ‘Clavian 
correction’; and MOON is the epact, or age of the moon on Ist January. 


(a) cent := y div 100 + 1; 

(b) greg := trunc(3*cent/4) — 12; 

(c) gold := y mod 19 + 1; 

(d) c := (8*cent+5) div 25 — 5 — greg; 

(e) e := 5%ydiv 4 — greg — 10; 

(f) moon := (11%*gold + 20 + c) mod 30; 

(g) if (moon=25) and (gold>11) or (moon=24) then moon := moon + 1; 
(h) d := 44 — moon; 

(i) if d< 21 thend :=d + 30; 

(Gj) d:=d+7— (d+e) mod 7; 


Now if D is less than or equal to 31 Easter is on March D, otherwise it is on 
April D-31. 

Look back at the Zeller program in Chapter 2, if you need guidance on 
producing a program of your own to determine and print the date of Easter for 
a series of years from Y1 to Y2 where Y1 and Y2 are given by the user. 


2. Write a program to produce a mile/kilometre conversion chart for distances up 
to 100 km. Use the fact that 1 mile = 1.609 344 kilometres. 
Tne table should begin as follows. 


Miles Kilometres 


0.6214 1.0000 
1.0000 1.6093 
1.2428 2.0000 
1.8641 3.0000 
2.0000 3.2187 


In other words the problem is to coordinate two separate series — one stepping 
through the whole numbers of miles and the other stepping through the kilometres 
one by one — in increasing order of distance. This process has much in common 
with the merge technique, which we will come across later. 

The difficult part is to ensure that the two separate series are properly 
interleaved. 


/ 


Procedures and 
functions 


The concept of the subprogram was described in print by Maurice Wilkes and 
associates as early as 1951 in connection with the EDSAC at Cambridge University, 
which had the distinction of being the first fully electronic stored-program computer 
to operate (Wilkes et al., 1957). In those days subprograms were termed ‘subroutines’, 
but whatever the name the concept is crucial to programming because it allows a 
large program to be built up piece by piece. 

In Pascal there are two kinds of subprogram — procedures and functions. We 
have already encountered a form of embedding (Section 6.4), but procedures and 
functions take this idea further by creating program units which can be named and 
referred to by name. This enables programs to be designed in a hierarchical fashion, 
and hierarchies appear to be the natural way for the human mind to cope with 
complexity (see Section 1.3). 

A program written as a single long sequence of instructions is said to be 
‘monolithic’; by contrast a program which is put together out of many procedures 
and/or functions is said to be ‘modular’. Often the ‘modules’ of such a program are 
themselves composed of lower level modules, and so on down to the lowest 
(elementary) level — which introduces the hierarchical structure. It is the contention 
of most enlightened programmers (including me) that modular programs are easier 
to write, easier to understand and more likely to work than monolithic ones. 


7.1. Procedures 


A procedure in Pascal is a named piece of program which carries out a particular 
task. Like everything else in Pascal it must be declared before it is used; so the first 
thing to realize is that there is a difference between defining a procedure and using 
it. 

Each procedure is defined only once; it may be used many times. Procedure (and 
function) definitions appear after the VAR declarations and before the first BEGIN 
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of a program. They are the last among the declarations. In fact the order of 
declarations is LABEL, CONST, TYPE, VAR, PROCEDURE/FUNCTION. The 


syntax of a procedure definition is given in Fig. 7.1. 
Tan 


identifier 


PROCEDURE 


list 


Parameter list 


type 
identifier 


Figure 7.1. Procedure definition 


Once it has been declared a procedure may be used in subsequent parts of the 
program simply by writing its name (followed by any parameters it requires, as 
explained in Section 7.2). It is as if a new kind of statement, to add to IF, CASE, 
FOR and the rest, has been introduced into the language. Such a statement is 
termed the procedure ‘call’ or ‘invocation’. What happens at that point, in essence, 
is that the statements making up the procedure ‘body’ are executed (with parameter 
substitution if necessary) and then control returns to the statement following the 
procedure call. The procedure body consists of the statements in its definition 
between BEGIN and END. 

For example, suppose we wanted to print a line of hyphens at various points in 
our output to mark off different sections. The sensible thing to do is to define a 
procedure for that purpose and use it where necessary. You will appreciate why if 
you glance at the two versions of the same simple program overleaf — the first 
without procedures, the second with one. 
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program easy; (% table of square-roots and logs +) 
varj :0.. 100; 
begin 

for j := 1 to 80 do write(’—’); 

writeln; 

(* heading *) 

writeln(‘table of square roots:’}; —_ writeln; 

for j := 1 to 100 do 

writeln(j:8,sqrt(j):10:4); 

for j := 1 to 80 do write ("—’); —_—writeln; 

(* part 2 *) 

writeln(‘table of natural logarithms:’}; —_ writeln; 

for j:= 1 to 100 do 

writeln(j:8,In(j):10:4); — writeln; 

for j := 1 to 80 do write (’"—’); ~~ writeln; 

end. 


program veryeasy; (+ modular version of above *) 
varj:0.. 100; 


procedure drawline; 
const wide = 80; 
begin 
for j := 1 to wide do write(’—’); 
writeln; 
end; (of procedure definition *) 


begin (#* main program *) 
drawline; (% first procedure call +) 
writeln(’table of square roots:’); —_ writeln; 
for j := 1 to 100 do 
writeln(j:8,sqrt(j):10:4); 
drawline; 
(* part 2 +) 
writeln(‘table of natural logarithms:’); — writeln; 
for j := 1 to 100 do 
writeln(j:8,In(j):10:4); 
drawline; 
end. 


Both these generate the same output, but the second is less cluttered. In 
particular the main program of the second is easier to read since it is smaller (and 
more descriptive by virtue of the fact that ‘DRAWLINE’ describes what is happening 
better than “FOR J:=1 TO 80 DO WRITE(‘—’); 1 WRITELN” does). 

When we introduced DRAWLINE we took the opportunity to insert a constant 
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WIDE=80 which is ‘local’ to the procedure. It is possible to define constants, types, 
variables and even procedures and functions within a procedure immediately after 
the procedure heading just as it is after the heading of the main program (see 
Fig. 7.1). 

Here we have assumed that an output line on the terminal is 80 characters long. 
So the result of the call 


drawline; 


is simply to fill the screen with one line of dashes. However this is still rather 
inflexible, as terminal widths of 32, 40, 64, 72 and occasionally 132 are found. 

In fact we can improve DRAWLINE is several ways. The first problem is that it 
alters J. J is a variable of the main program, which, since it is declared prior to the 
definition of DRAWLINE, can be used by that procedure. It is known as a ‘global’ 
variable; whereas WIDE is a ‘local’ constant. A procedure that alters the value of 
any global variables, as this one does, is said to have ‘side effects’. Normally such 
side effects are undesirable, since they can lead to unintended interactions between 
parts of the program that are meant to be independent. We could avoid this 
possibility by declaring a local variable to be used in the FOR loop of the procedure. 

The second problem, already alluded to, is that DRAWLINE always draws 
exactly 80 dashes. We might want it to be more flexible — to draw any number of 
dashes, or indeed of asterisks or underlines or whatever other character takes our 
fancy. 

We can incorporate both these suggested improvements as follows. 


procedure drawline(wide : integer; c : char); 

(%* draws line of length wide composed of character c *) 
var j : integer; 
begin 

for j := 1 to wide do 

write(c); 

writeln; 

end; (of definition *) 


Now when we call it by 
drawline(64, ‘x’)- 

it will produce 64 asterisks, while the call 
drawline(100, ‘='); 


will give us a line of 100 equal signs (quite effective for underlining, by the way). 


7.2 Parameters 


We have, in effect, defined not a single process but aclass of computations producing 
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lines of various lengths composed of various characters, one of which we select by 
giving particular values to the ‘parameters’ WIDE and C. 

The parameter mechanism is a means of passing information into (and perhaps 
receiving information back from) a procedure or function. 

What happens when we execute 


drawline(120, ‘—’); 


is that WIDE is assigned the value 120 and C is given the value ‘—’ (hyphen or minus- 
sign). Then within the procedure body WIDE and C can be used just like local 
variables, i.e. variables that belong to the procedure not the program that encloses 
it. The procedure can even alter the values of parameters if desired, though this will 
have no effect on the calling program. | 

Notice that the order of parameters in the procedure heading and in the 
procedure call determines which parameter will get which value. Observe also that 
each parameter must have its type declared, as all variables must have a type, so the 
value passed to a parameter must be of compatible type — namely one that could 
be assigned to it in an assignment statement. 

It is also possible to send results back to the calling program via parameters. For 
instance we might write a procedure 


procedure yardmile(ym : real); 
(%* converts ym from yards to miles *) 
begin 
ym := ym / 1760.0; 
(* 1760 yards in a mile *) 
end; 


and hope to use it to change its parameter. But if we call it by 
yardmile(dist); 


where DIST is a REAL variable, nothing will happen to DIST. This is because the 
normal mode of passing parameters to a procedure is one-way only: the value is 
passed in to initialize the parameter, but changes to the parameter’s value do not 
have any effect in the main program. If we want the final value of a parameter to 
be passed back to the calling program then we must put VAR in front of the 
parameter list concerned in the procedure heading. The correct version of 
YARDMILE is thus 


procedure yardmile(var ym : real); 
(% changes ym from yards to miles *) 
begin 
ym := ym / 1760.0; 
end 


which ensures that the revised value of YM is returned to the variable concerned in 
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the main program. The word VAR reminds us that the value in the procedure call 
must be a variable, not an expression or constant. It would make no sense to say 


yardmile(y1 + y2); 
or 
yardmile(1000.0); 


for what would be meant by altering the value of 1000.0? Would every subsequent 
reference to 1000.0 actually mean 0.5682? If so, chaos would ensue because 
constants could no longer be relied upon to keep their values. 


7.3. Local variables 


In the second version of procedure DRAWLINE (Section 7.1) there is a local 
variable called J. This variable is wholly private to DRAWLINE itself. Note 
especially that it is different from J in the main program. Altering J in the procedure 
has no effect on J in the calling program, and vice versa. The whole point of having 
local variables is to avoid such unwanted side effects. 

To clarify the question of local and global entities we will consider the following 
skeleton. 


program nest; 
(%* only partly shown +) 
var a,b : integer; 


procedure nestegg; 
var a,x : char; 


begin 
a:=/l; 
x= 7": 
b :=b+1; 
end 


begin (main program *) 


a :=0;b := 100; 
nestegg; (%* call the procedure *) 
writeln(a,b); 


end. 


A calling program or procedure may not ‘look in’ to the values of a procedure’s 
local variables, but a procedure may sometimes ‘look out’ at the enclosing program’s 
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global variables. (Note the asymmetry here.) Thus X is totally private to NESTEGG: 
the main program cannot alter it or refer to it at all. B on the other hand is shared. 
Once a variable (or other identifier) such as B has been declared all following 
procedures may refer to it, unless they declare that identifier for some other 
purpose. The inner use of the same name masks out the outer one. This is what 
happens to A in the previous example. For the program A is an integer variable, but 
NESTEGG has its own A which is of type CHAR; and so it cannot affect the A in 
the main program. The changes made to A by NESTEGG are lost on exit from the 
procedure body, but changes to B are preserved. 

Thus after the procedure call the WRITELN will print 


O 101 


as the current values of A and B. 
The importance of using local variables to prevent subprograms having 
unpredictable effects on one another is further discussed in Chapter 12. 


7.4 Functions 


A function is similar to a procedure except that a function always returns a scalar 


value. 
Thus a function definition resembles a procedure definition except that: 


(1) the TYPE of the function’s result is specified; 
(2) the function name is assigned a value, as if it were a variable, to return 
the result to the calling program (or calling procedure or function). 


A function is also called differently. Whereas a procedure call is a statement in 
its own right, the function call is an expression. It is used anywhere that an 
expression of that type may be used. 

If we wanted to define a function to compute a speed in miles per hour given a 
distance travelled in metres and a time taken in seconds, we might declare the 
following. 


function speedmph(m : real; secs : integer) : real; 
(* type of the function is real *) 
var mile,hour : real; (% local variables *) 


begin 
mile := m / 1609.344; (* miles covered *) 
hour := secs / 3600; (hours taken *) 
speedmph := mile /hour; (# result *) 
end; 


The procedure could then be called by something like 
rate := speedmph(dist,time); 
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Or 


writeln(m:8:2,’ metres in ‘,s:8,’ seconds is ’, 
speedmph(m,s):8:2,’ m.p.h.’); 


where DIST and TIME and M and S are appropriate main-program variables, of 
types REAL and INTEGER respectively. The second example shows that function 
calls need not be confined to the right-hand sides of assignments; they can appear 
wherever a value of the appropriate type is required. 

Thus a function computes a value by executing the Pascal statements making up 
the body of its definition. The value computed is assigned to the function name in 
order to return a result. 

We have already mentioned such functions as ROUND and SQR in Section 4.4. 
Those are predefined, built into the Pascal environment. The kind of functions we 
have been discussing here are those which the user defines. User-defined functions 
must be declared before they are called. 


7.5 Recursion 


A function or procedure may call another function or procedure, and so on, giving 
the program a layered structure. It is also possible for a function or procedure to 
call itself. This technique is called ‘recursion’. 

Recursion is nothing to be afraid of. It is a useful technique; but to the 
uninitiated it can sometimes seem like black magic. Because it is forbidden in such 
old-fashioned languages as Basic, Cobol and Fortran it has gained the reputation, 
erroneously, of being esoteric and difficult to understand. It is just another 
application of the well-tried programming philosophy ‘divide and conquer’. 

For example, here is a recursive function for calculating the highest common 
factor (or greatest common divisor) of two integers. 


function highfact(n,m : integer) : integer; 
(%* computes hcf recursively for positive integers *) 
var h : integer; 


begin (function body *) 
ifm>nthen (* put the larger first *) 
h :=highfact(m,n) = ( path 1 #) 


else 
if m <=0 then 
h:=n (path 2 #) 
else 


h :=highfact(m,n mod m); (%* path 3 *) 
highfact :=h;  (*h is result *) 
end; (%*of definition *) 
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The important point in recursive subroutines is that there must be an ‘escape 
clause’. In some simple cases there must be a non-recursive solution; and the 
recursive solution must tend to simplify the problem so that eventually it will 
become amenable to the non-recursive solution. Here the non-recursive route is 
chosen when M<=0 (PATH 2). 

The schematic of a worked numerical example, showing how the recursion winds 
and then unwinds, should make its operation clearer. 


hf(9,12) top level 
> hf(12,9) path 1 
> hf(9,3) path 3 


> hf(3,0) path 3 
<hf :=3 path 2 


<hf :=3 path 3 
<hf :=3 path 3 
hf :=3 path 1 
3 final result 


Not all recursive routines fall and rise in this symmetrical way, however. 


7.5.1 Forward reference 


If two procedures are mutually recursive — that is if one calls the other and is called 
by it in return — one has to be defined first; therefore when it refers to the second 
one that will not yet be defined. To overcome this problem Pascal uses the reserved 
word FORWARD to allow a procedure name to be declared but postpone the rest 
of its definition. 

In effect the declaration is split in two: first comes the name and parameters 
without local variables followed by the word FORWARD instead of the body of 
the subprogram. Later the full declaration appears, except that no parameters are 
given the second time. An example can be seen in Section 16.4. 


7.6 Example program [CHANCES] 


It is often the case in experimentation that we wish to know the probability of a 
proportion of events falling into one class rather than another. We might want to 
know what the chance of getting 15 heads in 20 tosses of an unbiased coin was 
(about 0.015). Or a psychologist training rats to run a ‘T’ maze, where by turning 
one way the animal will receive some food pellets and by turning the other way it 
will be given an electric shock, may find that on the first attempt 48% of the rats 
choose the ‘rewarded’ side and 52% the ‘punished’ one. On their second run of the 
T-junction, let us say, 69% turn towards the food and only 31% towards the shock. 
Is this evidence of learning? Or does it prove that the world is run by white mice 
who spend their time devising fiendishly cunning behavioural tests for innocent 
psychologists? (Adams, 1979). 
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We can calculate probabilities of this kind by reference to the binomial 
distribution; and the program CHANCES, below, does just that. 

The probability of I successes from N trials (in the rat example a turn towards 
food would count as a ‘success’) where the probability of success on a single trial is 
known to be P is 


c(i,n) * pAi * (1—p) 4 (n—i) 


where we have used ‘”’ to indicate raising to the power. 
The notation C(I,N) stands for 


n! / (i! # (n—i)!) 


which is the number of combinations of I from N — i.e. the number of different 
ways of selecting I objects from N objects, where I <= N. The exclamation mark 
signifies the factorial. Therefore N! equals N * (N—1) * (N—2)... #1. 

However factorials rapidly grow too big to handle: 6! = 720; 10! = 3 628 800; 
and the factorial of 14 is over 36 billion. We do not have to deal with such huge 
numbers because a recurrence relation for C(I,N) exists, such that 


e(i,n) / c(i—1,n) = (n—i+1) /i. 


Therefore we can work out C(1I,N) iteratively, starting from the trivial fact that 
C(0,N) is 1 by definition. All we need do is multiply by the ratio 


(n—k+1) /k 


repeatedly with K rising from 1 to I. This is what the function BINOMIAL does in 
the example program. 

Looking at the program, you will see that it makes use of three functions — 
BINOMIAL, PROB and EXPO. BINOMIAL is based on the recurrence relation just 
described, and avoids having to compute large factorials. PROB merely completes 
the probability calculations according to the formula given earlier. PROB contains 
embedded within it a local function definition, of EXPO. This is needed because 
Pascal lacks the exponentiation operator. Instead of using LN and EXP (as in 
Chapter 5) we raise to the power by repeated multiplication, since the exponent is 
always a whole number. Pay particular attention to the use of local variables in these 
functions. For example, there are three different I’s in the program. 

The main body of the program is reasonably straightforward. There is a REPEAT 
loop in which the user is required to give the number of successes and the total 
number of trials (variables R and N). If these are nonsensical the loop will terminate. 
Otherwise the number of combinations of R from N is computed and printed. Then 
the user is invited to supply the probability of success on any one trial. This enables 
the program to calculate the probability of exactly R out of N successes, and, more 
usefully, the chance of R or more successes in N trials. This involves summing the 
separate probabilities for R+1, R+2 . . . N successes using a FOR loop. 
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7.6.1 Program listing 
program chances; 
(% program to compute success-rate probabilities +) 
var i,n,r : integer; 
stop : boolean; 
c,p,pr : real; 


function binomial(i,n : integer) : real; 
(%* number of ways of taking i objects from a set of n *) 
(* returns real value for greater range *) 
varb: real; k : integer; 
begin 
b := 1.0; 
for k := 1 to ido 
b := b*(n—k+1)/k;  (* recurrence relation *) 
binomial :=b;  (# result %) 
end; (% of binomial function *) 


function prob(p: real; i,n : integer) : real; 
(* probability function containing nested function definition *) 


var q: real; 


function expo(a : real; b : integer) : real; 
vare: real; i: integer; 
begin (*exponentiation *) 
(* b should not be negative *) 


e := 1.0; 
for i :=1tobdo 
e := e*a; 


expo :=e; (%#ato the power b *) 
end; (inner, exponential function *) 


begin (%* probability function +) 

q := 1.0 —p; 

prob := expo(p,i) * expo(q,(n—i)); 
end; (of prob #*) 


begin (# main program *) 
stop := false; 
writeln(‘exact probability calculations:’); 


repeat (* main loop *) 
writeln; 
write(’number of successes? '); read(r); 
write(‘number of trials ?'); read(n); 
if (r>=0) and (n>0) and (r<=n) then 
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begin 
c := binomial (r,n); 
writeln(‘there are ‘.c:15:0,’ combinations of ’, 
r:7,’ from ‘,n:8); 
write(‘probability of success on each trial? ’); 
read(p); 
while (p< 0.0) or (p > 1.0) do 
begin (bad input, retry *) 
writeln(‘silly! ‘,p:8:4,’ must be 0.0 to 1.0!’); 
write(‘try again: '); read(p); 
end; 
pr := c * prob(p,r,n); 
writeln(‘probability of ‘,r:7,’ successes in ’,n:8, 
’ trials is’); 
writeln(pr:12:8); 
for i :=rt+1 to ndo 
begin (*sum all more extreme probabilities *) 
c := binomial(i,n); 
pr := pr + c*prob(p,i,n); 
end; 
writeln(’probability of ’,r:7,’ or more successes is’); 
writeln(pr:12:8); 


end 
else (* funny input *) 
begin 
writeln(‘impossible input values!’); 
stop := true; 
end; 
until stop; 


writeln (‘program halts.’); 
end. 


7.6.2 Sample run 


This program was run on the 380Z microcomputer. Some results are shown below. 


EXACT PROBABILITY CALCULATIONS: 


NUMBER OF SUCCESSES? 8 

NUMBER OF TRIALS? 16 

THERE ARE 12870. COMBINATIONS OF 8FROM 16 

PROBABILITY OF SUCCESS ON EACH TRIAL? 0.1454545 

PROBABILITY OF 8SUCCESSESIN 16TRIALS IS 
0.00073328 

PROBABILITY OF 8OR MORE SUCCESSES IS 0.00085877 
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NUMBER OF SUCCESSES? 0 

NUMBER OF TRIALS? 23 

THERE ARE 1. COMBINATIONS OF OQFROM 23 

PROBABILITY OF SUCCESS ON EACH TRIAL? 0.2 

PROBABILITY OF OSUCCESSESIN 23 TRIALS IS 
0.00590293 

PROBABILITY OF OOR MORE SUCCESSES IS 
0.99999580 


NUMBER OF SUCCESSES? 0 
NUMBER OF TRIALS? 0 
IMPOSSIBLE INPUT VALUES! 
PROGRAM HALTS. 


The two test cases chosen actually show the probability of winning a first dividend 
on the football pools, and the chance of selecting a group of 23 people at random 
with no left-handers among them. 

To win a jackpot on the pools (getting 8 ‘score draws’) you must have your 8 
correct forecasts on a week in which only 8 of the 55 matches on the coupon result 
in score draws. This is the probability of success 8/55 = 0.145 454 5. For the 
example we assume an entry (or ‘perm’) of 16 selections, rather a costly one in 
fact. The overall probability of success ends up as 0.000 858 77 which is almost 1 
in 1000. So you can expect one every 20 years if you enter regularly. (Can you spot 
a flaw in this argument? There nearly always is a catch in any application of pure 
mathematics to the sordid world of gambling.) 

The second test data is explained by the fact that the present author was teaching 
a class of 23 which turned out to include no one who wrote with the left hand. 

It is estimated that nearly 20% of the population is left-handed, so the probability 
of success is 0.2 and R=0 and N=23. The probability of zero successes is thus about 
0.0059, and in this case we are really interested not in the probability of 0 or more 
successes (which should be 1.0 but is computed as 0.999 995 8 because of round-off 
errors in floating point arithmetic) but in 0 successes precisely. (There could not be 
less than none!) The results suggest that this sample of 23 people is not picked from 
a population in which the probability of left-handedness is 0.2. 


7.7. Exercises 


Here are some problems for you to practise on. They can be answered with the 
material covered so far. 


1. Write a program to compute the length of your life, in seconds. You give it 
today’s date and your date of birth — both in the form of three numbers (day, 
month, year) — and it tells you how many seconds you have lived. If you want to 
be really ambitious, extend it to accept the time on the 24-hour clock as well, i.e. 


PROCEDURES AND FUNCTIONS — 85 


three additional numbers per date, namely hours, minutes and seconds past 
midnight. 

The heart of the program should be a function that takes a date as input and 
yields the number of days since a particular milestone, such as 1752. That was the 
year when the Gregorian calendar was adopted in Britain. This decrees a leap year 
every four years except on years that are divisible by 100 and not by 400. 

Having converted both dates to days elapsed since 1752 the difference can be 
found simply by subtraction. But do not forget to validate the input and reject 
impossible dates. This validation will be greatly simplified if you define suitable 
data types. 


2. If you are paying off a mortgage monthly, do you know that the payment is 
correct? Without a computer, you will just have to take the lender’s word for it, 
unless you enjoy tedious mental arithmetic. Alternatively you could write a 
program to determine correct monthly mortgage payments. The program will read 
values for the following variables: 


loan amount borrowed; 
ai annual interest rate in percent; 
y number of years to pay. 


From these it can calculate and print the monthly payment required to pay the 
loan off in the prescribed time. 

Your solution can be based on the following analysis. Let P be the monthly 
payment and R be the monthly interest factor. Since the interest is calculated each 
month on any unpaid balance R = 1 + AI/1200, e.g. on 12% R is 1.01. At the end of 
one month the borrower owes LOAN#R — P pounds or dollars or whatever. After 
two months the sum owed is 


(loanr — p)*r — p = loan*¥r42 — p*r — p 

and in general, after N months, the amount owed is 
loansr An — p*(r4(n—1)+r4 (n—2) .. . +r+1). 

The problem is then reduced to solving for P in the equation 
p = (loan#r4n) / (r4(n—1)+r4 (n—2) . . . +r+1) 


where N is Y*12 — the number of months over which the debt is to be repaid. Once 
again we use ‘“’ for raising to the power. 

The program should employ a function to sum the series of values R“I with I 
ranging from 0 to N—1 — the denominator of the expression above. The function 
EXPO from Section 7.6 could be useful here. Your monetary unit can be Altairan 
megabucks, the universal galactic currency. 


+ 
Arrays 


Now at last things start getting really interesting. In previous chapters the only kind 
of data we could deal with have been scalar — numbers, characters and so forth. 
With the array we enter the realm of data aggregation. This enables us to handle 
data that is grouped or structured in some way. In Pascal the array is said to be a 
‘structured’ data type. (Scalars belong to the ‘simple’ or ‘unstructured’ types.) 

Scalar data is indivisible: it cannot be subdivided into constituent parts. The array, 
however, is a collection of items. An array can be referred to as a whole by name or 
one of its elements can be picked out by an index or subscript that indicates its 
relative position. (Here the words ‘index’ and ‘subscript’ are used interchangeably.) 

Many absolutely fundamental computing tasks — for instance, sorting a set: of 
numbers into ascending or descending order — would be almost inconceivable 
without arrays. 


8.1. Array declaration 


The array type is declared in terms of a base type and an index type. The syntax 
diagram in Fig. 8.1 spells this out. 


Figure 8.1 Array declaration 


The base type states what kind of components the array contains; the index type 
states what kind of values may be used to select individual components, and 
implicitly determines the number of elements in the array. The number of elements 
in a Pascal array is fixed by its definition, and they are all of the same type. 

Having set up an array it is possible to refer to any one of its elements by 
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enclosing a subscript in square brackets after the array name. Thus 


type r100 = array [1..100] of real; 
vara: 1100; 


or, more simply, 
var a: array [1..100] of real; 

declares a 100-element array of REAL numbers and 
a[20] := 5.5; 

assigns 5.5 to element 20 of that array, while 
writeln(a[j+1] :10:4); 


prints the value of the J+1th element. If J=95 this is A[96], the 96th element. 

A subscript can be any expression of the index type, including of course a 
constant or variable. The word ‘subscript’ means something written below the line, 
but here as in many other places (cf. the division operator) the inflexibility of most 
computer I/O devices forces everything to be written on one level. 

The index type must be an ordinal type, i.e. a scalar other than REAL, hence 
there is an ordering to the elements of an array. Here the index type (1. .100) was 
specified by a subrange. This is very common, especially with subranges of the 
INTEGER type. 


8.2  Anarray application 


We can see the usefulness of arrays by considering the evolution of a simple letter- 
counting program. 

We start with a program that reads some characters, up to and including the first 
fullstop, and then prints out the frequency of ‘E’ and also its proportional frequency ; 
At this stage it does not require the use of an array. 


program counter1; 
(* tallies e *) 
var ch : char; 
counter, n : integer; 


begin 
writeln(‘type text ended by fullstop:’); 
n:=0; counter :=0; (initialize counters *) 


repeat 

read(ch); n:=n+1; 

if ch = ‘e’ then counter := counter + 1; 
until ch = ‘.’; 
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writeln(n:8,’ characters read.’); 
writeln(’frequency of e is ‘,counter:10); 
writeln (‘percentage of e is’, 
counter*#100 /n :10:2); 
end. 


But this task is unrealistically simple. Let us suppose we wish to count not just 
E’s but all vowels. This entails having five counters instead of one, and makes the 
coding somewhat more complex. 


program counter2; 

(* tallies a,e,i,o,u without using an array *) 

var ch : char; 
countera,countere,counteri,countero,counteru,n : integer; 


begin 
(* first clear all counters *) 
n :=0; 


countera := 0; 
countere := 0; 
counteri := 0; 
countero := 0; 
counteru := 0; 
writeln(‘type text ended by fullstop:’); 


repeat 
read(ch); n:=n+1; (* next character *) 
case ch of 
‘a’: countera := countera + 1; 
‘e’: countere := countere + 1; 
‘i’: counteri := counteri + 1; 
‘o’: countero := countero + 1; 
‘u’: counteru := counteru + 1; 
otherwise: 
(% do nothing *) 
end; (case *) 
until ch = ‘.’; 


writeln(n:8,’ characters read.’); 
writeln(‘letter’:10,’frequency’:10,’percent’:10); 
writeln(‘a’: 10,countera:10,countera *100/n :10:2): 
writeln(‘e’: 10,countere:10,countere *100/n :10:2): 
writeln(‘i’: 10,counteri: 10,counteri *100/n :10:2); 
writeln(’o’:10,countero:10,countero#100/n :10:2); 
writeln(‘u’:10,counteru:10,counteru*100/n :10:2); 
end. 
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The result of typing 


let us go then, you and i 
when the evening is spread out against the sky 
like a patient etherized upon a table. 


to this program is printed below. 


112 CHARACTERS READ. 
LETTER FREQUENCY PERCENT 


A 8 7.14 
E 14 12.50 

| 7 6.25 
O 4 3.57 
U 4 3.57 


The program is already getting unwieldy, but the whole approach based on many 
single variables (counters in this case) breaks down if we attempt to extend it to 
count not just vowels but all letters. We would need 26 separate variables. 

Employing an array, however, the program for this extended task actually 
shrinks. 


program counter3; 

(* tallies a..z *) 

var counter : array [’a’..'2z’] of integer; 
ch : char; 
n : integer; 


begin 
(* initialize array elements by a loop *) 
for ch := ‘a’ to ‘z’ do 
counter[ch] :=0; 
n:=0; (total character count *) 
writeln(’type text ended by fullstop:’); 


repeat 
read(ch); n:=n+1; 
if (ch >= ’a’) and (ch <= ‘z’) then 
counter[ch] := counter[ch] +1; (% only count letters +) 
until ch = '.’; 
writeln(n:8,’ characters read.’): 
writein(‘letter’:10,’frequency’:10,’percent’:10): 
for ch := ‘a’ to ‘z’ do 
writeln(ch:10,counter[ch] :10, 
counter [ch] *#100/n :10:2); 


end. 
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Here we use the fact that (in the ISO/ASCII code) characters ‘A’ to ‘Z’ form a 
contiguous, ascending sequence. This makes it possible to test whether a character 
is a letter by 


if (ch >= ’a’) and (ch <= 'z’) then.... 


as we will elsewhere in the book. 


8.3. Vectors and matrices 


The arrays considered up to now have had only one subscript. From Fig. 8.1 you 
will see that it is possible to have more than one. 

An array with a single subscript is one-dimensional. It is commonly referred to 
as a ‘vector’. A vector can be visualized as a row of storage cells, each capable of 
holding one value of the base type. 

After 


type m = (jan, feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec); 
var days : array [m] of 28. .31; 
this : m; 


for this := jan to dec do 
case this of 
jan,mar,may,jul,aug,oct,dec: 
days[this] := 31; 
apr,jun,sep,nov: 
days[this] := 30; 
feb: 
days[this] := 28; 
end; (*case *) 


the state of vector DAYS can be illustrated as follows. 


DAYS 


ow [ale l=]@[a]@]a[*[=[s[o po 


position JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC 


It is very important to distinguish clearly in your mind between the subscript 
and the value it designates. A subscript identifies a position at which a value is 
stored. In DAYS the index type and the base type are not the same, so there should 
be little danger of confusion; but with something like 


mark : array [0. .99] of 0. .99 
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a beginner might assume, mistakenly, that the value of MARK [48] is 48. It may 
not be. 

A two-dimensional array requires two subscripts to locate a particular element. 
It may be thought of as a matrix or grid. Because matrices lend themselves to tabular 
presentation on paper, the first subscript is often spoken of as the ‘row’ and the 
second as the ‘column’. The image of a rectangular grid is sometimes, but not 
always, helpful. 

Another way to think of a two-dimensional array is as a vector of vectors, and 
Pascal actually provides two alternative notations to reflect these two viewpoints. 
We could either say 


type line = array [1. .15] of char; 
var scrabble : array [1. .15] of line; 


or, more succinctly, 
var scrabble : array [1..15,1. .15] of char; 


to set up a representation of the 15-by-15 board of Scrabble. We could then place 
an asterisk on the centre square by 


scrabble [8] [8] := ‘*’: 
in the former case, and by 
scrabble [8,8] := '*’: 


in the latter. 

Although the second style is more usual, and more convenient, both are 
equivalent; and either may be used. The more long-winded method reminds us that 
the first subscript selects one item (from 15) which is a LINE and the second 
subscript selects one item in that LINE, which is a CHAR. 

If you want a pictorial guide to the contents of matrix SCRABBLE, the top 
left-hand ‘corner’ of the board might look as depicted beneath. 


12345678... . 
FORTY 


Rh WN = 
os4st 


Here SCRABBLE[1,1] = ‘F’, SCRABBLE[1,2] = ‘0’, SCRABBLE[2,4] = ‘W’, 
SCRABBLE[4,2] =‘ ’ (blank) and so on. However it is as well to realize that what 
we do when we store information of this kind in the computer is to represent the 
real world of flat cardboard and plastic squares, etc. The array is an abstract structure 
which may correspond well, or badly, with what we use it to represent. We should 
not be captivated by an icon which is only meant to aid our understanding. 
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Some typical applications of two-dimensional arrays include representing: 


a table of inter-city distances; 

a chess board; 

coefficients in a system of equations; 
shading information for a map; 

sectors of the galaxy in Star Trek; 

a contingency table; 

temperatures at various points on a surface. 


There are of course many others. 

We can show a two-dimensional array in use by extending the letter-counting 
example to tally the frequencies of letter pairs instead of individual letters. This 
might be a first step towards a statistical analysis of some English text. 


program counter4; 
(* counts frequencies of each 2-letter sequence *) 


const null = ‘@’; (+ dummy character for all non-alphabetics +) 
type letters = null... ‘z’; 


var c2 : array [letters,letters] of integer; 
ch,last : char; 
n : integer; 


begin 
(% initialization *) 
n:=0; 
for last := null to ’z’ do 
for ch := null to ‘z’ do 
c2[last,ch] := 0; 
last := null; (+ pre-set last character read *) 


writeln(‘type text ended by fullstop:’); 


repeat (main loop *) 
read(ch); n:=n+1; 
if (ch < ‘a’) or (ch > 'z’) then 
ch := null: 
(% all non-alphabetics treated alike +) 
c2[last,ch] := c2[last,ch] +1; (increment pair count ¥) 
last :=ch; (last character scanned +) 
until ch = ’.’; 


(% now print table of results *) 
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writeln; 
write(’ ‘:3); 
for ch := null to ‘2’ do 
write(ch:3); (+ header line *) 
writeln; 
for last := null to ‘z’ do 
begin 
write(last:3); (+ row label *) 
for ch := null to ‘2’ do 
write(c2[last,ch]:3); (# row of digram counts *) 
writeln; 
end; (of line in the o/p table *) 


end. 
A partial output from this program is shown below 


@ AB C 


On D> ®@ 
NN OW 
ONA SD 
=OwWO 


2 
1 
8 
2 


This indicates that the combination AB occurred 4 times, that the sequence BA 
occurred twice and that B was followed 8 times by a non-alphabetic character, 
among other things. (We used ‘@” as a null character because it immediately precedes 
‘A’ in the ASCII collating sequence.) 

Notice how the frequency table is printed out by a FOR loop nested within 
another FOR loop. This is standard practice for dealing with two-dimensional 
arrays. Indeed the FOR loop and the array are natural partners: the FOR statement 
was designed explicitly for traversing arrays, though it has other uses. 


8.4 Arrays of characters 


A string is a sequence of characters. A true string datatype is found in Basic and 
some other languages, including certain versions of Pascal (e.g. UCSD Pascal). The 
distinctive feature of strings is that their length can vary. Unfortunately standard 
Pascal does not have string variables or operators to act on variable-length strings. 

However in many applications a fixed upper limit can be placed on the length of 
a string, and then strings can be held as character vectors. For example, 


type line = array [1. .80] of char; 
might be used to hold a line of characters read, say, from a punched card; or 


type name = array [1. .20] of char; 
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might suffice for storing people’s surnames, which very rarely exceed 20 characters 
in length. 

Most computers can fit more than one character into a single storage location; 
for example, four 8-bit bytes can be packed into one 32-bit computer ‘word’. 
Pascal provides the PACKED qualifier to economize on storage in such cases. Thus 
the types 


type name = array [1. .20] of char: 
packname = packed array [1. .20] of char: 


are equivalent except that the latter would require less storage space (and possibly 
take longer to access). PACKED is a directive to the compiler: its effect, if any, is 
‘behind the scenes’. The meaning of the program should remain the same whether 
PACKED is inserted or not. 

Many Pascal implementations have a type 


alfa = packed array [1. .10] of char; 


which is predefined. ALFA is convenient for holding (short) words, names and such 
like. Ten letters seems quite a handy size — at least in English. In this book we will 
use ALFA from time to time. 

To facilitate the use of packed character arrays as strings the six relational 
operators are defined for comparisons of equal-length strings (arrays or constants). 
Hence 


‘uvwx’ < ‘uvwz’ 
is true, while 

‘abed’< ‘ab ‘ 
is false. (The space precedes all letters in all respectable character sets.) 

Note that the lexical order is based on the character collating sequence, which 
may differ between different machines — although ‘A’ is always less than ‘B’ to ‘Z’ 
and ‘0’<‘1’?...< °9’, 

String constants are taken to be of type 

packed array [1.. sl] of char 


where SLis the length of the string. This means that a string constant can be assigned 
to a packed array, provided the constant is just the right length, as shown below. 


type name = packed array [1. .20] of char; 
var word : alfa; 
n : name; 


word := ‘decimalize’: 
n := ‘mr richard s forsyth’; 
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After these two assignments WORD > ‘DECIMALISE’ (because ‘Z’ > ‘S’) and 
N[4] =‘R’ etc. Remember to pad out string constants with blanks to the right length 
if necessary. For instance in 


’ 


word := ‘decimal ; 


the three trailing spaces are required. 


8.5 Array of hope 
Here some further points on the use of arrays are noted, which will become useful 


as you gain familiarity with the concept through practice. 


8.5.1 Multi-dimensional arrays 


Arrays of three or more dimensions may be declared, e.g. 
var cube : array [1. .4,1. .4,1. .4] of integer; 

which could be accessed by 
cube [2,i,n mod 4 + 1] 

or 
cube (2] [i] [n mod 4 + 1] 


or something similar. Pascal places no upper limit on the number of dimensions. In 
practice most compilers have some sort of limitation, which is usually very generous. 
I have seen 255 quoted as the maximum number of dimensions on one system: 
anyone who can think of a sensible use for a 256-dimensional array should write 

off to The Guinness Book of Records. (NB The maximum number of elements in 

an array is distinct from, and normally much greater than, the maximum number of 
dimensions. ) 


8.5.2 Passing arrays as parameters 


If you pass an array to a procedure it is usually advisable to do so asa VAR 
parameter, even if the procedure does not alter its contents. Otherwise time and 
space is wasted by making a complete copy of the array as a local entity within 
the procedure, or function. 

Remember also that parameters must have their type specified by name, so the 
heading 


procedure p (var v : array [1. .100] of real); 
will not do. To correct it requires a type declaration such as 


type vect = array [1. .100] of real: 
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so that the heading can be amended to 
procedure p (var v : vect); 


to achieve the intended result. 


8.5.3 Operations on entire arrays 


The only operations defined on entire arrays are assignment and comparisons for 
equality — except as mentioned in Section 8.4 concerning strings. 


8.5.4 Allowable components 


Officially the base type of an array may be any type, meaning that arrays containing 
any kind of component are allowed. In practice most compilers forbid arrays of 
files (see Chapter 9). 


8.6 Example program [SHOWOFF ] 


For those of us who are not expert, published games of chess are somewhat 
inscrutable. The moves are listed, but it is up to the reader to keep track of the 
state of play at any given time. If you are a poor visualizer, or just moderately lazy, 
you will have to resort to a chessboard and play it out for yourself. 

But why pay £5 for a chess set and do it by hand when your £500 personal 
computer can do the job for you? 

The diagram below represents the starting position of a chess game using the 
algebraic notation. 


A B C DB E F GH 
BR BN BB BQ BK BB BN BR 
BP BP BP BP BP BP BP BP 


8 
7 
6 
5 
4 
3 
2 WP WP WP WP WP WP WP WP 
1 WR WN WB WQ WK WB WN WR 
The pieces are: K (king), Q (queen), R (rook), B (bishop), N (knight) and P (pawn) 
prefixed by W (white) or B (black) as appropriate. Each square has a unique address, 
e.g. the black rooks start on A8 and H8 and the white queen starts at D1. 

The moves of the game can be recorded by specifying a source and a destination 
address. For example the two moves 


E2 E4 
D7 D5 
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mean that white moved a pawn to king’s four and black responded by moving the 
pawn in front of the queen two steps forward. 

The problem is to write a program that will read in a series of moves in this 
format and adjust an internal representation of the chessboard to correspond with 
the new board position. When the special dummy move 


ZO ZO 


is read, the program is to print out the current board state showing where all the 
pieces are, using the same format as the diagram above. When a dollar sign (‘$’) is 
read, the program will stop (and display the final positions). 

The program SHOWOFF solves this problem. Note that it makes use of an array 
as the obvious representation of the chess board. It also makes heavy use of 
procedures, enabling the main program to be compact and — so I hope — readily 
understandable. The main program comes last but it should be read first: essentially 
it summarizes the processing. Then you can try to work out how each of the 
procedures works, and how they interrelate. 


8.6.1 Program listing 


It should be emphasized that the program here does not play chess. That really 
would be difficult. It merely acts as an aide-mémoire. 

Also its input validation is rudimentary, being merely a check that the source 
and destination squares exist on the board. Since illegal moves are not rejected, it 
will let you have an anarchic game of chess if you have a mind to — moving any piece 
anywhere at will. By the same token it will let you move pieces back to where they 
were or correct errors without complaint. 


program showoff; 
(* maintains record of chess game displaying it when requested *) 


const stop = ‘$’; 


type chessman = (bp,bn,bb,br,bq,bk,wp,wn,wb,wr,wa,wk, none); 
grid = array ['a’. .’h’, ‘1’. .’8’] of chessman; 


var chbd : grid; (the chess board *) 
col1,col2,row1,row2 : char; 
move : 0. .9999: 
endgame : boolean; 


Procedure setboard; (+ initializes board before play *) 
var c,r : char; 
(%* operates on global array chbd *) 


begin 
for c := ‘a’ to ‘h’ do 
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begin 
chbd[c,’7’] := bp; 
chbd[c,’2’] := wp; 
end; (pawns in place *) 
c:='1'; r:='8'; 
chbd[‘a’,r] := br; chbd[‘a’,c] := wr; 
chbd[‘b’,r] := bn; chbd['b’,c] := wn; 
chbd['c’,r] :=bb; chbd['c’,c] := wb; 
chbd[‘d’,r] := bq; chbd[’d’,c] := wa; 
chbd[‘e’,r] := bk; chbd[’e’,c] := wk; 
chbd ['f’,r] := bb; chbd[‘f’,c] := wb; 
chbd[‘g’,r] :=bn; chbd['g’,c] := wn; 
chbd[’h’,r] := br; chbd[‘h’,c] := wr; 
(* pieces in place *) 
for r := '6’ downto ‘3’ do 
for c := ‘a’ to ‘h’ do chbd[c,r] := none; 
(% empty squares *) 
end; (of initialization *) 


procedure display(var cb : grid; m : real); 
(%* prints image of chess board *) 
var c,r : char; 


begin (+ heading first *) 
writeln; 
writeln(‘state of play after ’,m:5:1,’ moves... .’); 
writeln; 
write(’ ’); 
for c := a’ to ‘h’ do write(c:4); (* top labels *) 
writeln; 
for r :='8’ downto '1’do_ = (* rank +) 
begin 
write(r:2,’ '); (* side label *) 
forc:=‘a’to’h’do (file *) 
if cb[c,r] = none then write(’. .’:4) — ( blank sq. *) 
else write(cb[c,r] :4); 
(% assumes chessman can be written *) 
writeln; 
end; 
writeln; writeln; 
end; («of board show *) 


procedure boardupd(var cb: grid; c1,r1,c2,r2 : char; var move : integer); 
(* revises board position given new move *) 
(* test is local function to check input +) 
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function test(c,r : char) : boolean; 
(* tests whether c and r are valid input +) 
begin 
if (c >= ‘a’) and (c <= ‘h’) 
and (r >= '1') and (r <= ‘8’) then 
test := true 
else test := false; 
end; (test *) 


begin (board update procedure *) 
if test(c1,r1) and test(c2,r2) then 
begin 
cb[c2,r2] := cb[c1,r1]; (+ onto new square *) 
cb[c1,r1] :=none; (clear off old one +) 
move := move + 1; 
end 
else 
writeln(‘bad move: ’,c1,r1,’ ‘,c2,r2); 
end; (*of board update +*) 


procedure readmove(var c1,r1,c2,r2 : char); 
(%* gets next move from user *) 


procedure skipupto(var c : char); 
(% skips blanks and new lines for free-form input *) 


begin 
if not endgame then (stop looking after terminator +) 
begin 
repeat read(c) until (c << > ‘$’); 
endgame :=c~=stop; (+ may set global variable +) 
end; 
end; (of skipupto +) 


begin (read move *) 
skipupto(c1); skipupto(r1); 
skipupto(c2); skipupto(r2); 
end; (*of readmove *) 


begin (+ main line *) 


writeln(’chess companion:’): 

writeln(’type moves as source and destination squares, e.g.’): 
writeln(’ c7 cS’); 

writeln(‘use 20 20 todisplay board.’): 

writeln(’use ‘stop,’ to terminate.’): 

writeln; 
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setboard; (% first fill the board *) 
endgame := false; 
move :=0;: 


while not endgame do ( endgame=true when game is over *) 
begin 
readmove(col1,row1,col2,row2); 
if (col1 = ‘z’) and (col2 = ’z') or endgame then 
display (chbd,move/2) 
else 
boardupd(chbd,col1,row1,col2,row2,move); 
(% revise board according to latest move *) 
end; (of main loop *) 


writeln; 
writeln(’game over.’); 


end. 


8.6.2 Sample output 


Here is the printout from running this program on a published game — the only 
game in the 1980 World Computer Chess Championship in which a micro-based 
program beat one running on a mainframe. This competition took place in Linz, 
Austria on 25 to 29 September 1980. 

Note that for castling a double move had to be specified. Notice also that 
captures do not need any special processing since by landing on a square a piece 
obliterates any prior occupant. 


WHITE: Advance 1.0 (U.K.) — Bit-Slice Micro 
BLACK: Dark Horse (Sweden) — Univac 1100/81 


CHESS COMPANION: 

TYPE MOVES AS SOURCE AND DESTINATION SQUARES, E.G. 
C7 C5 

USE ZO ZO TODISPLAY BOARD. 

USE $ TO TERMINATE 


E2 E4 E7 €E65 

Gi F3 B8 C6 

Fi BS G8 F6 

H1 F1 E1 G1 F8 €E7 
Fi £1 E7 D6 

ZO ZO 


STATE OF PLAY AFTER 


A B C D 
BR .. BB BQ 
BP BP BP_ BP 

.. BN’ BB 
WB 


WP WP WP WP 
WR WN WB WO 
D4 C6 D4 

D4 A7 A6 

C4 B7 Bd 

D5 F6 D5 

D5 H8 F8_ E8 
F5 C8 B7 

ZO ZO 


STATE OF PLAY AFTER 
A B C D 


NO @~NWAATA~NO 
WN 


mow 
ZERFE 


8 BR BO 
7 .. BB BP BP 
6 BP .e .. BB 
5 BP .. WP 
4 

3 ee 

2 WP WP WP a 
1 WR WN WB WO 


C1 H6 G7 H6 
Di G4 $ 


STATE OF PLAY AFTER 
A B C D 


8 BR .. .. Ba 
7 BB BP BP 
6 BP _ .. BB 
5 BP .. WP 
4 

3 we eel 

2 WP WP WP 

1 WR WN 

GAME OVER, 


At this point Black resigned. 


5.5 MOVES.... 


E F G 
BK 
BP BP 
.. BN 
BP 
WP... 
WN... 
.. WP WP 
WR .. WK 


G8 


12.0 MOVES... 


E F G 
BR BK 
BP BP 


BP WN 


.. WP WP 
WR... WK 


H 
BR 
BP 


WP 


BP 


WP 


13.5 MOVES .... 


E F G 
BR BK 

BP 

BP WN 
wQ 
WP WP 
WR... WK 


BP 


WP 
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8.7 Exercises 


1. Below is a record of the strength of emission (‘H—K flux’) at a certain wavelength 
of light by a sunlike star (HD catalogue number 81809) within 80 light-years of our 
solar system, as measured over 15 years on a telescopic spectrometer. 


66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 
18 .22 .20 20 .17 .16 15 16 18 19 .20 .21 .16 .17 .16 


With such data it is often important to look for sequences, ‘peaks’ and ‘troughs’. 
These may give a clue to cyclic behaviour in the star (like the sun’s 11-year sunspot 
cycle). So your task is to write a program which will read a series of observations of 
up to 15 real numbers for the years 1966 to 1980 and locate: 


(a) the longest ascending sequence of values; 
(b) the longest descending sequence of values; 
(c) all the peaks; 

(d) all the troughs. 


A peak may be defined for this purpose as a reading that is greater than its 
immediate predecessor and its immediate successor; a trough is one that is less than 
its predecessor and successor. There are peaks at 67 and 77 and troughs at 72 and 
78 in the values above. 

For the sequences print out the starting and ending years; for the peaks and 
troughs print out the year and the data value. Make sure that your program works 
at the boundaries, and that it gives sensible answers even when there are less than 2 
input values and also when all the data is in ascending or descending order. 


2. Write and test a procedure to reverse the order of characters in a string. It should 
begin with the heading 


procedure reverse(var t : line; | : slength); 


where T is of TYPE LINE = PACKED ARRAY [1. .80] OF CHAR and L is of type 
SLENGTH = 0. .80 giving the actual number of characters in T. 


3. An array A of N elements can be put into ascending order by the following 
simple algorithm. 


(a) If N< 2 then exit (job done). 

(b) Find the location, I, of the largest element from A[1] to A[N]. 
(c) Interchange A[I] with A[N]. 

(d) Decrease N by 1, and repeat from step (a) 


This is the selection sort. 

Write a procedure which will sort a vector of N elements into descending order 
using this method, and write a program to test it. (NB Better sorting methods are 
discussed in Chapter 13.) 
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4. The median of a set of numbers is that number which would be in the middle if 
they were ranked. The median of 


19 4 8 2 10 0 


is 4 which is greater than three (0,1,2) and less than three (8,9,10). The obvious 
method of finding the median in a vector is to sort the items into order and then 
pick the middle one. However an elegant and far more efficient procedure is known 
which will not only locate the median but any desired quantile (such as the 25th 
percentile — i.e. that value which exceeds 75% of the items). 


procedure quantile (var a : ivector; |,m,n : integer); 
(* finds mth element by rank in integer array all. .n] ¥) 
var i,j,item,temp : integer; 


begin 
while |< n do 
begin 
item :=al[m]; i:=l; j:=n; 
repeat 
(* find item in lower part that is too big *) 
while ali] <item doi :=i +1; 


(* find item in upper part that is too small *) 
while a[j] > item do j :=j — 1; 
ifi<=jthen (exchange them *) 


begin 
temp :=alj]; alj] :=ali]; ali] := temp; 
b:=i+1; jr=j-7; 
end; 
until j <i; (two sweeps have crossed *) 
ifj<mthen |! :=i;  (% new lower bound *) 


ifm<ithenn:=j; (%* new upper bound *) 
(* place for trace *) 
end; 
(* now a[m] is in the correct relative position *) 
end; (% of quantile *) 


The call of this procedure to find the median would be 
quantile(v,1,n div 2,n) 


which would ensure that the item V[N DIV 2] was greater than or equal to and 
less than or equal to half the items. 

Write a program to exercise this procedure as fully as possible. Try data that is 
ordered, data that is all the same and any other odd cases you can think of. You 
should not be able to make it fail. 
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The objective is to understand how the procedure works. If you have difficulty, 
insert WRITE statements where the comment (* PLACE FOR TRACE #) is to 
show L, N and the current contents of A[L] to A[N] at the end of each pass round 
the main loop. 


5. Write a program which accepts a number LIMIT from the user and then, for all 
numbers from 2 up to LIMIT, prints: 


*& prime ** if the number has no factors 
a list of the prime factors of that number _ otherwise. 


You should use an array to store all the prime numbers found so far below N, the 
number currently under investigation. This will help to decide which candidates to 
test as possible prime factors of N. If none of the primes in the array (less than 
SQRT(N)) divide N exactly then N is prime. 


6. Look back at Section 6.8 to the algorithm for calculating Easter day. Package it 
up as a procedure (or function) and then use it in a program which runs through all 
the dates 1600 to 4000 AD and records which day Easter Sunday falls on in each 
year. The output of your program should give the number of times Easter falls on 
each of the possible dates 22 March to 25 April inclusive, from which it should be 
easy to see the commonest. (19 April should be the most frequently occurring.) 
Your program should also print a list of all the years in which Easter is 22 March 
and all those in which it is 25 April. These are the extremes which last happened in 
1818 and 1943 respectively. 

The array to tally the frequencies will need 35 elements. The numbering of the 
subscripts is up to you. 


7. Write a program to generate all the factorials from 1! to 100! where N! (i.e. 
factorial of N) = N *(N—1) *(N—2)....*2 #1. The trouble is that 100! is far 
too large to hold as a REAL or INTEGER in the computer: it is too large even to 
print on one line. But you can represent very large numbers as arrays. Thus 


Afi} A[2] A[3] Al4] AD] Al6] Al7] ALS] 
120 331 8645 201 331 870 1 0 


might be one way of representing the integer 120 331 045 201 331 870 001 000 
which is over 10 to the power of 23. To multiply this number by 50, say, you 
multiply the last element by 50; keep the remainder (mod 1000) and ‘carry’ the 
quotient (div 1000) to the next place on the left; then you move one element to 
the left and repeat the process. All you have to do is mechanize the long 
multiplication rules you learned, or failed to learn, at school; but this will not be 
easy. You will also need to think carefully how to print out a number that is 
stored as an array of numbers. 

It is not necessary to be able to multiply two very long integers together, merely 
one very long one by a normal INTEGER. For instance, having computed factorial 
77 you multiply it by 78 to arrive at factorial 78, and so on to 100. 


The image of the computer as a ‘number cruncher’ becomes less and less true every 
year. Today’s computers are increasingly used as controllers, communicators and, 
above all, repositories of information. This information is held for long-term storage 
on backing-store devices such as magnetic discs or tapes in the form of files. 

Information is held on file for any or all of several reasons: because it must be 
shared between programs; because it must remain in existence after the program 
that generated it has ceased to run; because it is too large for the primary store of 
the computer. 

The great advantage of having data stored as a file or set of files is that once it is 
there you can do practically anything with it, limited chiefly by the limits of your 
imagination. When you reach this stage you think not of one program and its results 
but of a suite of programs clustered round one or more files — for creating, 
maintaining, analysing, rearranging and displaying the stored data. When you need 
to do something else with it, you just write another program. 


9.1 Types of file 
In data processing three major categories of file are distinguished: 


(1) sequential files; 
(2) direct-access files; 
(3) indexed (or keyed) files. 


A sequential file may be read (or written, but not both) serially from beginning 
to end, one item at a time. It is not possible to start processing a sequential file 
from anywhere except the beginning. There is no way, for example, to read up to 
a particular point and then start writing from there. The physical medium on which 
the concept of the sequential file is modelled is the magnetic tape, where the tape 
winds forward from one reel onto another past a read/write head that scans one 
datum at a time. 

A direct-access file is divided up into chunks, normally called ‘records’ (except 
that Pascal uses the word ‘record’ in another context), each of which may be 
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reached individually by its numeric address. This feature is characteristic of magnetic 
disc rather than tape. 

The indexed file is an interesting notion. It too requires the direct-access facilities 
of disc; but instead of locating a record by its address it is found by the contents of 
a key field which is part of the data in that record. Such a key might be a name, a 
social security number, an account code or the like: whatever key is used must be 
unique to the record it identifies. This is the most natural mode of access, an 
‘associative memory’, but it also requires the most work. In order to implement 
indexed files a hierarchy of indexes and pointers must be built up and manipulated 
by software. 

The fact that Pascal recognizes only the first of these three kinds of file all but 
disqualifies it as a serious data-processing language, though many implementations 
do provide some means of direct access — e.g. via built-in procedures. 

If you are new to programming, however, you will have enough on your plate 
learning about sequential files, which are all that we deal with in this chapter. 


9.2 Sequential files 


In Pascal a file is a sequence of components. All components are of the same type. 
Only one component is available for access at any one time. The number of 
components is variable, from zero upwards; indeed a file grows as new items are 
added to the end of it. 


9.2.1 Setting up a file 


To use a file in Pascal you must first declare it, and then open it. It is declared as a 
variable with a FILE type. For example 


type intfile = file of integer; 
var f : intfile; 
wordlist : file of alfa; 


declares two files, named F and WORDLIST. The former contains integers, the 
latter components of type ALFA, i.e. 10-character vectors. 

A file is opened for reading by the RESET procedure and for writing by the 
REWRITE procedure. REWRITE also erases its previous contents, if any (so be 
careful). The files F and WORDLIST might be opened as follows. 


reset(f); (file f prepared for input *) 
rewrite(wordlist); (wordlist prepared for output *) 


In addition, any files that exist before the program is run or that are to exist 
after it has terminated must be listed in the program heading. This is the normal 
case: files that are not mentioned in the program heading are strictly temporary. In 
previous chapters the programs have merely had names, for example 


program chances; 
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but from now on the name will usually be followed by a list of files, as in 
program inandout (datafile,listing); 


for example. 

The name by which a file is known within a Pascal program and the name by 
which it is known outside may well differ. Normally the operating system has 
commands for associating one with the other, i.e. for connecting the Pascal names 
to particular files. These commands vary from one machine to another. It can be 
advantageous to have a program that reads from and/or writes to a variety of files 
so that, for example, the particular set of data used is decided only when the 
program runs. 


9.2.1 GET, PUT and the buffer 


Once set up, a file may be processed one item at a time by the predefined 
procedures GET and PUT. The single component accessible at any given time is 
known as the ‘buffer’. 

This component is distinguished from the file itself by putting an up-arrow 
‘A’ after the file name. Thus if F names a file F4 refers to its buffer. 

This distinction between the file as a whole (F) and the component being 
scanned (F4) is important to grasp. 

GET is used on a file opened by RESET. It moves along the file to the next item 
and makes that item available to the program as the value of the file buffer variable. 
PUT is used on a file opened by REWRITE. It appends the contents of the file 
buffer to the end of the file, thus lengthening it by one item. 

Thus 


get(f); i:=f”; 


moves on to the next item on the file F and places its contents in the variable I; 
while 


wordlist’ := ‘data-files’; put(wordlist): 


loads WORDLIST’s buffer with the ten characters DATA-FILES and then transfers 
these by affixing them to the end of the file. 

NB RESET performs an implicit GET. As well as opening a file for input it places 
its first component in the file buffer, unless the file is empty. 


9.2.3. End of file 


When reading information from a file a program needs to be able to test whether 
there is any more data left. To enable this Pascal provides a standard function EOF. 
EOF(f), where ‘f’ stands for any file name, becomes true when all the data in the 
file has been read. (If the file is empty it is true to begin with.) An attempt to read 
further when EOF is true causes an execution error. With EOF files of unknown 
length can be handled. 


108 PASCAL AT WORK AND PLAY 


9.2.4 Two small examples 


Our first example reads a series of integers from file F which is presumed to exist 
already and counts the number of positive, zero and negative values. 


program intcount(f); 


var f : file of integer; 
n,p,z : integer; 


begin 
(%* set counters *) 
n:=0; z:=0; p:=0; 


reset(f); (+ open for reading and get first item *) 
while not eof(f) do 
begin 
if f4 <<Othenn :=n+1 
else if f4 >O then p:=p+1 
else z:=z+1; 
get(f); 
end; (% main loop *) 


writeln(n+p+z:10,’ items on file.’); 
writeln(’positive’:12,’zero’:12, ‘negative’: 12); 
writeln(p:12,2:12,n:12); 

end. | 


Our second example reads a series of words (of 10 characters) from the keyboard 
and puts them on a file. If the input string is ‘**+xxxxe## it stops. 


program putwords(wordlist) ; 


type wordfile = file of alfa; 
var wordlist : wordfile; 
w : alfa; 
c,k : integer; 
done : boolean; 
begin 
writeln(‘give 10-letter words one per line:’); 
writeln(‘halt with «x on last line.’); 
k :=0; 
rewrite(wordlist); 
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repeat 
for c := 1 to 10 do read(w[c]); 
readin; (* words have to be padded to 10 characters *) 
done := W = ‘weseeesesieseoeseoe! : 
if not done then 
begin 
wordlist’ :=w; 
put(wordlist); 
=k+1; 
end; 
until done; 


writeln(k:7,’ words stored on file.’); 
end. 


9.2.5 The special files INPUT and OUTPUT 


We have introduced the use of files as an advanced topic (as in most respects it is) 
but in reality we have been using files all along. 

There are two special files in Pascal, INPUT and OUTPUT. One of their 
peculiarities is that they do not have to be declared as variables. In fact they must 
not be declared. Nor need RESET or REWRITE be used on INPUT and OUTPUT: 
they are opened by the system. (Moreover on most systems they have to be listed 
in the program heading.) 

INPUT is allocated by default to the standard input device — normally a terminal 
keyboard, though on older, batch-mode systems it may be a card reader. 

OUTPUT is connected to the standard output device, unless the user specifies 
otherwise — typically a display screen or a printer. 

The terminal can be treated as a file because it generates, or consumes, a stream 
of characters one by one. It is therefore a sequence. In Pascal a sequential file of 
characters is called a ‘text file’, and a type TEXT 


type text = file of char; 
or sometimes 
type text = packed file of char; 
is predefined. INPUT and OUTPUT are both text files. 


9.2.6 READ and WRITE revisited 


Any of the four procedures READ, READLN, WRITE or WRITELN can take a 
text file as its first parameter. 


There is an equivalence between 


read(f,ch);: 
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and 
ch :=fA;  get(f); 
where CH is of type CHAR and F of type TEXT. Likewise 
write(f,ch);: 
is equivalent to 
f4 :=ch;  put(f); 
where the types are the same as before. Furthermore 
read (input,ch) 
may be abbreviated to 
read (ch) 
and 
write (output,ch) 
to 
write(ch) 


since INPUT or OUTPUT (as appropriate) is assumed if no file is given as first 
argument of READ or WRITE. READLN and WRITELN behave similarly. So 
whenever we used READ or READLN in earlier chapters we were, without 
mentioning it, also using INPUT. And whenever we used WRITE or WRITELN we 
were using OUTPUT. 


9.2.7 Text files 
But you may have noticed that we can READ and WRITE numbers as well as single 
characters. 

The procedures READ, READLN, WRITE and WRITELN have been extended 
in two ways: firstly, as already discussed, they can have one or more parameters; 
secondly, the parameters can be of several types. 

The types permitted on all installations are CHAR (of course, since it is 
fundamental), INTEGER and REAL. In many versions of Pascal ALFA, BOOLEAN, 
enumerated types and sets (see Chapter 10) are also allowed as input/output 
parameters, and we assume so in this book. The Pascal system converts between its 
internal format, which is some sort of bit pattern, and an external representation in 
terms of a character sequence when READ, READLN, WRITE or WRITELN is used 
on a text file. 

If you do not appreciate how much work this does for you, try writing the code 
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to read such diverse forms as 
"1 ti 
1.23’ 


1.23e4' 
‘—1,23e—45' 


and converting them into REAL values by scanning one character at a time 
(checking for errors as well). 

Text files differ from ordinary files in another respect: they have two levels. A 
text file is a sequence of Jines each of which is a sequence of characters. The line 
may contain zero or more characters and is ended by a special terminator symbol 
which is not treated by Pascal as a character. Very commonly the end-of-line marker 
is in fact a pair of characters — carriage-return and line-feed. WRITELN generates 
the special symbol or symbols to indicate the end of a line, and READLN may be 
used to scan up to, and just beyond, the end of a line. 

In order to test whether the end of a line has been reached the standard Boolean 
function EOLN should be used. EOLN(TF) is true when reading on file TF has just 
passed the last character before the line terminator, and false otherwise. An attempt 
to 


read (tf,ch) 


when EOLN(TF) is true will put a dummy blank character into the variable CH and 
advance to the start of the next line. EOLN makes it possible to distinguish genuine 
blank spaces from those produced by reading over the end of a line. READLN(TF) 
can always be used to advance to the next line. 

When reading INTEGERs and REALs end-of-line markers are treated as spaces 
and skipped by READ if necessary. 

Note that ordinary files, i.e. files of types other than TEXT, are not normally 
legible except by Pascal programs. This means that if you use the normal command 
(outside Pascal) to list a non-text file on the terminal or printer you will get some 
strange results, because the data are held in binary form, not as characters. 


9.2.8 An analogy 


If you want a simple model of the Pascal sequential file, imagine a deck of cards 
stacked, face up, on a table. The top card can be inspected without disturbing the 
pile; the cards beneath it are not visible. There are several points of correspondence 
with the various file operations. 


state after RESET — cards face-up on table 

the file buffer — the top card 

GET — picking the top card and discarding it 
READ — picking up and keeping the top card 


end of file — abare table 
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9.3. Example program [SEARCHER] 


The computer industry is notorious for its jargon. Computer jargonauts, of whom 
there are many, have a fatal weakness for abbreviations, especially acronyms. (A 
jargonaut is someone who calls a jogger’s jockstrap a ‘run-time support system’.) 

The following program can be considered as a small prototype of an interactive 
companion — helpful to those who, like myself, often wish to make sense of 
microcomputing articles larded with NMOS and MNOS and CMOS and PMOS and 
suchlike. Give the program an abbreviation and it responds, if it can, with the full 
description. 

The datafile used (called GLOSSARY in the program) is a text file with the 
abbreviations in alphabetical order, each followed by its description on a separate 
line. It is listed below in its entirety. 


ABC 

Absolute Binary Code 

AC 

Accumulator 

ADC 

Analogue to Digital Converter 

ALU 

Arithmetic Logic Unit 

ASCII 

American Standard Code for Information Interchange 
BCD 

Binary Coded Decimal 

CCD 

Charge Coupled Device 

CP/M 

Control Program for Microcomputers 
CPU 

Central Processing Unit 

CRT 

Cathode Ray Tube (or Cathode Ray Terminal) 
DAC 

Digital to Analogue Converter 

DEC 

Digital Equipment Corporation 
DMA 

Direct Memory Access 

DMAC 

Direct Memory Access Controller 

DP 


Data Processing 
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EAROM 

Electrically Alterable Read-Only Memory 
EBCDIC 

Extended Binary Coded Decimal Interchange Code 
ECL 

Emitter Coupled Logic 

ELAN 

Electronic Local Area Network 

EPROM 

Eraseable Programmable Read-only Memory 
I/O 

Input/Output 

IBM 

International Business Machines 

IC 

Integrated Circuit 

IEEE 

Institute of Electrical and Electronic Engineers 
LED 

Light Emitting Diode 

LSI 

Large Scale Integration 

MOS 

Metal Oxide Semiconductor 

MOSFET 

Metal Oxide Semiconductor Field Effect Transistor 
MPU 

Micro-Processing Unit 

MUX 

Multiplexor 

NMOS 

Negative-channel Metal Oxide Semiconductor 
OCR 

Optical Character Recognition 

PCB 

Printed Circuit Board 

PET 

Personal Electronic Transactor 

PIA 

Programmable Interface Adaptor 

PLA 

Programmable Logic Array 

PMOS 

Positive-channel Metal Oxide Semiconductor 
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PNL 

Polytechnic of North London 

POST 

Point of Sale Terminal 

PROM 

Programmable Read-Only Memory 

RAM 

Random Access Memory (Read-And-write Memory) 
ROM 

Read-Only Memory 

SSI 

Small Scale Integration 

TTL 

Transistor—Transistor Logic 

UART 

Universal Asynchronous Receiver—Transmitter 
ULA 

Uncommitted Logic Array 

UVEPROM 

Ultra-Violet Eraseable Programmable Read-Only Memory 
VAT 

Value Added Tax 

VDU 

Visual Display Unit 

VLSI 


Very Large Scale Integration 


Even with only 50 entries it forms the basis of a useful glossary. 

The program uses this dictionary to look up the definition of any word the user 
types. The heart of the program is the search function LOOK which employs the 
Binary Search logic to locate the target word in the wordlist. This technique merits 
a flowchart to itself (Fig. 9.1). 

In the Pascal function LOOK which employs this method HI and LO are the 
upper and lower limits to the search. When the item sought is compared to the 
midpoint of the search area within the array there are three possible outcomes: 


(1) they are equal, exit with success; 
(2) the sought item is smaller; 
(3) the sought item is larger; 


In cases (2) and (3) the search must continue; but in case (2) the entire top half 
of the array can safely be ignored, while in case (3) the entire bottom half can be 
ignored. So HI or LO is reset and a new test is made in the middle of the reduced 
range. If LO ever meets HI the search has failed: the sought item was not present. 
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W Value to be sought 
Function LOOK WL Ordered vector 
N No. of items in WL 


LO :=1 
HI :=N 


Failure: 
YES LOOK :=0 
NO 
M := 
(LO + HI) 
+2 
Too low: < _ Success: 
LO:=M+1 ~ LOOK:=M 
> 
Too high: 
Hl:=M—1 


Figure 9.1 Binary search 


This function depends on the fact that in Pascal equal-length strings can be 
compared using the normal relational operators. 

Since the search space is halved on each cycle the process rapidly converges. 
The expected number of probes in an array of size N is not more than log2 of N. 
With NV = 50, as here, this is less than 6; whereas a linear search from the beginning 
of the array one item at a time would take on average NV/2 or 25 probes. This is a 
fivefold saving; but with larger NV the saving is even more dramatic. If N were 
increased to 500 the linear search would take ten times as long; the Binary Search 
time would not even double. 

The Binary Search is another example of the computing principle: divide and 
conquer. You should not think that it is confined to looking up words in a 
dictionary. It can be used wherever.a table look-up is required. Applications include: 
finding an address or telephone number given a name; finding a bank balance given 
an account number; finding a price given a product code; finding a disc track given 
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the key to a record in an indexed file. It is one of the classic Computer Science 
techniques, simple and efficient. 

Of course the data must be ordered or the method will not work. 

The program SEARCHER is over 100 lines but its outline is straightforward. 


begin (main line *) 
(* print heading *) 
(%* read data from file into array, 
copying words onto the terminal 
and checking the words really are in 
ascending alphabetic order *) 
(% instructions for user *) 
repeat 
(* get word from user *) 
if word < > terminal then 
begin 
(% find its location using binary search *) 
if (* not found *) then 
(% show nearest word in dictionary *) 
else (* show word itself *); 
(* display corresponding definition *) 
end; 
until word = terminal; (% signal to stop *) 
end. 


The words are read into an array of ALFA strings, and their definitions into an array 
of LINEs. The LINE data type is a packed array of characters, of length LMAX. 


9.3.1 Program listing 


program searcher (glossary,input,output) ; 
(% seeks acronyms on file and gives definitions if present *) 


const Imax = 64; (% max line length on input file +) 
wmax = 100; (*% max number of items on file *) 
terminal = ’** ' 


type line = packed array [1. ./max] of char; 


var i,j,p,we : integer; 
word : alfa; 
wordlist : array [1 . .wmax] of alfa; 
meaning : array [1. .wmax] of line; 
glossary : text; (+ main data file *) 


procedure getline(var f : text: var this : line); 
(% reads one line from file f and puts into this [ ] *) 
var i,j : integer; 
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while not eoln(f) and not eof(f) and (i<Imax) do 
begin 
i:=i+1; 
this[i] :=*4; get(f); | (# next character +) 
end; (of one line *) 
for j :=i+1 to Imax do 
this[j] :=’ '; (% pad out to full length with blanks +) 
if not eof(f) then readin(f); ( move to new line +) 
end; 


function look(var w : alfa; hi : integer) : integer; 
(* seeks w in global array wordlist[1. .hi] +) 
(% uses binary search to locate position *) 
var lo, midpoint : integer; test : boolean; 


begin 
lo :=1; (lowest possible position *) 
test := false; 
while (hi >= lo) and not test do 
begin 
midpoint := (hitlo) div 2; 
test := (w = wordlist[midpoint] ); 
if w > wordlist[midpoint] then (* w too big *) 
lo := midpoint +1 (look in upper half #) 
else if w < wordlist [midpoint] then (: w too small *) 
hi := midpoint —1; (look in lower half *) 
end; 
if test then look := midpoint (+ success x) 
else look := —midpoint; (% signal of failure *) 
end; 


begin (main line *) 
writeln(‘interactive dictionary look-up program.’); 
writeln; 
writeln(‘list of abbreviations in dictionary:’); 
writeln; 


(* first read data from file *) 
we := 1; 
reset(glossary); 
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while not eof(glossary) and (wc <= wmax) do 
begin 
readin(glossary,wordlist[we]); (the acronym *) 
(+ readin is assumed defined for type alfa 
but not necessarily for type line +) 
getline(glossary,meaning[we]); (* its definition *) 
writeln(wordlist[we]); (%* copy to terminal %) 
ifwe>1then (check ascending sequence *) 
if wordlist [wc] < wordlist [wc—1] then 
writeln(’** warning: ‘,wordlist[we],’ out of order!’); 
we := wet; 
end; 
we :=wce—1; (actual word count *) 
writeln(we:7,’ entries read from file.’); 


(* instructions *) 

writeln(‘just type a line containing only’); 
writeln(terminal); 

writeln(’‘to halt program.’); 

writeln; 


(% main search loop *) 
repeat writeln; 
write(’what word do you seek ? ’); 
while input* =’ ‘doget; (ignore leading spaces if any *) 
read(word); 
if word < > terminal then 
begin 
p := look(word,we); 
if p<=Othen (not found in wordlist *) 
begin p:=abs(p); (% p will now be nearest *) 
writeln(‘sorry ’‘,word,’ is not known.’); 
writeln(’‘nearest item is:’); 
write(wordlist[p]); 
end 
else (+ found it *) 
write(word): 
(% show its definition *) 
write(’——> ’): 
for i:= 1tolmax do  write(meaning[p] [i]); 
writeln; 
end; 
writeln; 
until word = terminal; 
end. 
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9.3.2 Sample runs 


Here is a run of SEARCHER on the DEC System-10 with the vocabulary as listed 
in Section 9.3. DEC System-10 Pascal automatically converts lower case CHAR 
values to upper case (capital letters) which is why the definitions do not look so 
good. 

Notice that when it cannot find a word it nevertheless prints the definition of 
the word where the search terminated — the nearest item alphabetically. Sometimes 
this is quite helpful (as with PIO which led to PIA and CPM which led to CP/M), 
other times less so (as with GOD leading to EPROM!). 


INTERACTIVE DICTIONARY LOOK-UP PROGRAM. 
LIST OF ABBREVIATIONS IN DICTIONARY: 


ABC 
AC 
ADC 
ALU 
ASCII 
BCD 
CCD 
CP/M 
CPU 
CRT 
DAC 
DEC 
DMA 
DMAC 
DP 
EAROM 
EBCDIC 
ECL 
ELAN 
EPROM 
I/O 

IBM 

IC 

IEEE 
LED 

LSI 
MOS 
MOSFET 
MPU 
MUX 
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NMOS 
OCR 
PCB 
PET 
PIA 
PLA 
PMOS 
PNL 
POST 
PROM 
RAM 
ROM 
Ss 
TTL 
UART 
ULA 
UVEPROM 
VAT 
VDU 
VLSI 
50 ENTRIES READ FROM FILE. 
JUST TYPE A LINE CONTAINING ONLY 
BR 
TO HALT PROGRAM. 


WHAT WORD DO YOU SEEK ? PROM 
PROM ——> PROGRAMMABLE READ-ONLY MEMORY 


WHAT WORD DO YOU SEEK ? EAROM 
EAROM ——> ELECTRICALLY ALTERABLE READ-ONLY MEMORY 


WHAT WORD DO YOU SEEK ? BOLLOCKS 
SORRY BOLLOCKS IS NOT KNOWN. 
NEAREST ITEM IS: | 

CCD -——> CHARGE COUPLED DEVICE 


WHAT WORD DO YOU SEEK ? OMR 
SORRY OMR IS NOT KNOWN. 
NEAREST ITEM IS: 

PCB ——> PRINTED CIRCUIT BOARD 


WHAT WORD DO YOU SEEK ? ABC 
ABC ——> ABSOLUTE BINARY CODE 


WHAT WORD DO YOU SEEK ? PIO 
SORRY PIO IS NOT KNOWN. 
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NEAREST ITEM IS: 
PIA ——> PROGRAMMABLE INTERFACE ADAPTOR 


WHAT WORD DO YOU SEEK ? CPM 

SORRY CPM IS NOT KNOWN. 

NEAREST ITEM IS: 

CP/M ——> CONTROL PROGRAM FOR MICROCOMPUTERS 


WHAT WORD DO YOU SEEK ? GOD 

SORRY GOD IS NOT KNOWN. 

NEAREST ITEM IS: 

EPROM ——>ERASABLE PROGRAMMABLE READ-ONLY MEMORY 


WHAT WORD DO YOU SEEK? TTL 
TTL ——> TRANSISTOR-TRANSISTOR LOGIC 


WHAT WORD DO YOU SEEK ? LOVE 
SORRY LOVE IS NOT KNOWN. 
NEAREST ITEM IS: 

LSI ——> LARGE SCALE INTEGRATION 


WHAT WORD DO YOU SEEK ? xx 


For the second run a change was made to LOOK to reveal the progress of the 
search. This is quite instructive. The line 


writeln(lo:4,midpoint:4,hi:4,wordlist [midpoint] :12); 
was inserted immediately before the line 
test := (w = wordlist[midpoint] ); 


to display the lower and upper bounds of the search as well as the current midpoint 
and the word in the dictionary at that midpoint. This clearly shows the width of 
the search narrowing down step by step for two successful and two unsuccessful 
searches. The average number of probes is 5. 


INTERACTIVE DICTIONARY LOOK-UP PROGRAM. 
LIST OF ABBREVIATIONS IN DICTIONARY: 
(x 50 LINES OMITTED HERE FOR BREVITY +) 


50 ENTRIES READ FROM FILE. 
JUST TYPE A LINE CONTAINING ONLY 
cs 
TO HALT PROGRAM. 
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WHAT WORD DO YOU SEEK ? EBCDIC 


1 
1 
13 
13 
16 
17 


EBCDIC 


1 
1 
13 
19 
19 
20 


25 
12 
18 
15 
16 
17 


50 
24 
24 
17 
17 
17 


LED 
DEC 
ECL 

DP 
EAROM 
EBCDIC 


——> EXTENDED BINARY CODED DECIMAL INTERCHANGE CODE 
WHAT WORD DO YOU SEEK ? GOD 


25 
12 
18 
21 
19 
20 


50 
24 
24 
24 
20 
20 


LED 
DEC 
ECL 
1/O 
ELAN 
EPROM 


SORRY GOD IS NOT KNOWN. 
NEAREST ITEM IS: 


EPROM -——>ERASEABLE PROGRAMMABLE READ-ONLY MEMORY 
WHAT WORD DO YOU SEEK ? MOSFET 
1 25 50 LED 
26 38 50 PNL 
26 31 37 ##=.NMOS 
26 28 30 MOSFET 
MOSFET ——> METAL OXIDE SEMICONDUCTOR FIELD EFFECT TRANSISTO! 
WHAT WORD DO YOU SEEK ? NOTHING 
1 25 50 LED 
26 38 50 PNL 
26 31 37 #=x.NMOS 
32. 34 37 + PET 
32. 32 33 OCR 


SORRY NOTHING IS NOT KNOWN. 
NEAREST ITEM IS: | 
OCR ——> OPTICAL CHARACTER RECOGNITION 


WHAT WORD DO YOU SEEK ? #* 


9.4 Exercises 


1. The glossary used by SEARCHER was created by hand using a text-editor 
program; but it is rather laborious to keep such a file up to date in this way, 
especially as it grows longer, because new entries must be inserted in the correct 
order. 
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Write a program that will update this kind of file automatically. 

The program will read new words and their definitions from the user at the 
keyboard, and when enough have been given will sort them into alphabetical order 
and then merge them with the contents of the old glossary file to produce a revised 
glossary file. 

The merging process involves stepping through the file and the ordered list of 
new entries (probably an array) and comparing the current item from each — the 
lower being sent to the output file. If a new entry has the same word as an old one, 
drop the old one: it can be assumed to be a correction. 

Merging is just a way of combining two ordered lists into one longer list, also 
ordered. For this example one list will be an array, the other a file. The program 
fragment below exemplifies the logic of merging file F and array A onto file G. 
Your program would also have to take account of the fact that each entry on the 
dictionary is in fact a pair of strings, not just a single item. 


reset(f); rewrite(g); p:=1; 
while not (eof(f) or (p > newitems)) do 
begin 
iff’ <a[p] then (# file entry is smaller #) 
begin 
g* :=f4%;  get(f); 
end 
else (array entry is smaller or equal *) 
begin 
g* :=alpl]; p:=p+t; 
if f4 = a[p—1] then get(f): 
(% ignore duplicates on f *) 
end; 
put(g); 
end; 


(* finish off whichever list is not empty *) 
while not eof(f) do 


begin 
g* = f%; 
put(g); get(f); 
end; 
while p <= newitems do 
begin 
g* :=alp]; 
put(g);  p := pt; 
end; 


A fuller exposition of merging can be found in Chapter 13. 
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The selection sort method given in Chapter 8 is rather inefficient for this task. A 
‘more suitable method is the Shell sort, for which a flowchart is shown in Fig. 9.2. 
Be sure that your program can cope with the case when the old file is empty or 


N No. of items in vector A 
|, J, KK, M are integers 


NO 


Swap A [I] 
with A [I+M] 


Figure 9.2 Shell sort 
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when no new items are given; and that it behaves correctly when the lists do not 
finish together, as will normally be the case. 


2. Write a program to produce a calendar for any year 1900 to 2099. 1 January 
1900 was a Monday. Put the output onto a text file so that many copies can be 
printed without re-running the program. 


3. Write a program to help ‘literary fingerprinting’. It has been found that writers 
of English (and other languages) are very consistent in certain usages, especially in 
the relative-frequency of common words like ‘and’ and ‘the’. This fact is used in the 
detective work of determining authorship of anonymous or disputed passages or 
works. 

Your program should read a text file and produce the following statistics 


(a) number of words and average word length; 
(b) number of sentences and average sentence length. 


In addition it should allow the user to type up to 12 short words, at the start of the 
program, and for each of these it should print at the end the number of occurrences 
and frequency expressed as a percentage of words read. 

You will have to think about how a ‘word’ is defined: perhaps as a sequence of 
letters leading up to a non-alphabetic character. (But what about ‘non-alphabetic’ 
or ‘don’t’?) And you will have to think harder about how to define a sentence. 
There is no perfect definition (because abbreviations like ‘etc.’ and ‘i.e.’ confuse the 
issue) but a ‘.’ or a ‘?’ ora ‘!’ immediately preceded by a letter is a reasonable one. 

Use this program to attempt to find the ‘odd one out’ of three passages, one 
written by one author and the other two by another. The sort of words to use as 
diagnostics are very simple ones like ‘a’, ‘of’, ‘the’, ‘to’ and so on. 


4. Write a program that, given a chemical formula, will find the molecular weight 
for that formula. The formulas will be given as an element name then a number of 


atoms, then an element name and a number of atoms, and so on... . For example 
h 2 0 1 water 
h. 2 s 1 0 4 © sulphuric acid 
na 1 cl 1 salt 
0 2 molecular oxygen 


where even if an element occurs only once it is followed by a number (1) to make 
the input scan simpler. (The spaces are used as separators.) If one complete formula 
is given per line then EOLN can be used to test whether the formula is finished. 

At the start the program should read off file a list of atoms and their atomic 
weights. This should include all elements lighter than zinc (ZN 65.37) and a few 
other heavier ones like lead and uranium (PB 207.19 and U 238.03) which are 
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interesting. The first four entries might look something like this. 


h 1.00797 
he 4.0026 
li 6.939 
be 9.0133 


If your version of Pascal permits it, use an enumerated type whose constants are 
the international symbols for the elements. This can then be used in the input phase 
and also as an index type for the array of atomic weights. (Alternatively you could 
use the Binary Search to locate the element names in a table, in which case they 
would need to be ordered alphabetically.) 
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Sets and records 


The last two data aggregation concepts we deal with are the set and the record. 


10.1 Sets and set operations 
The idea of the set is at the foundation of mathematics. Pascal provides a way 


of handling sets, though not in their full generality. 


10.1.1 What is a set? 


A set is a collection of objects. In Pascal a set contains zero or more members all of 
one type. To declare a set type one uses the notation 


set of base-type 
where base-type is a scalar (not REAL). For example 
type charset = set of char; 


defines a type of set whose constituent members must be characters. 

In any set every value of the base type is either present or absent; so a set is 
a particular combination of values. 

A set may have no members at all (‘the empty set’) or it may have any number 
up to a fixed limit — the number of values in the base type. The maximum possible 
number of items in a Pascal set may be quite small. On the CDC 6600 it is 59; 
on the DEC System-10 it is 72; on many microcomputers it is 255. Set types 
which could contain more than this maximum are not allowed. So 


set of integer; 

would not be permitted, and even 
set of char; 

might not be acceptable, although, for instance, 
set of O.. 48; 


always would be. Even with this restriction, sets are quite useful. 
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10.1.2 Setting the scene 


Set variables are merely variables of a set type, as in 


type jantodec = (jan,feb,mar,apr,may ,jun,jul,aug,sep,oct,nov,dec); 
monthset = set of jantodec; 

var m : jantodec; 
mm : monthset; 


where M can be any month and MM can be any collection of months. 

To denote a set in a Pascal program you enclose a list of elements within square 
brackets. There may be no elements (the empty set), one element or a list of 
elements separated by commas. These elements may be constants, variables or 
expressions of the appropriate base type, or a pair of elements with two dots 
between — indicating a range of values. Thus 


mm:=[]; (empty *) 
mm := [jan]; (january only *) 
mm := [jan .. mar,aug,nov . . dec] ; 
(% january to march, august, november to december +) 
mm := [feb,m]; (+ february and the value of m +) 


are all legal set assignments. 
Consider also 
= oct; 
as opposed to 
mm := [oct] ; 
and be sure that you understand the difference. 
Sets and set operations can be implemented very efficiently because the 


computer’s underlying binary store lends itself naturally to set representation. 
Thus given objects 


var hue1,hue2 : set of (red, yellow,green,blue); 


the compiler would allot 4 bits to each object 


r y g b 
so that the sets [RED, YELLOW] and [YELLOW, GREEN, BLUE] could be 
held as follows 
HUE1 := [RED, YELLOW] ; HUE2 := [YELLOW, GREEN, BLUE]; 


r y gb r y gb 
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where 1 signifies presence and 0 absence of a particular value in the base type. 


(This should explain why the size limitation discussed in Section 10.1.1 is 
imposed: it has to do with the length of the computer word.) 


10.1.3 Set operations 


Just creating sets and assigning them to variables would not exploit the power 
of the set data type. Accordingly the set operators of union, intersection and 
difference are defined. Where L and R are sets 


L+R_ (union) is the set of items in either set or both; 
LR _ (intersection) is the set of items common to both; 
L—R_ (difference) is those items in L which are not also members of R. 


To be specific, given the declarations 


type girl = (anna,eleanora,frances,jane,lulu,mary ,sara,suzanne); 


var nice,willing : set of girl; 
the assignments 


nice := [anna,frances,mary . . suzanne] ; 
willing := [anna,lulu,mary sara] ; 


could be used to create two sets — those girls who are nice and those who are 
willing. This situation can be represented by a diagram. 


WILLING NOT WILLING 


FRANCES 
NICE | SUZANNE 


NOT NICE LULU ELEANORA 
JANE 


Then 


nice + willing 


defines the set of all girls who are either nice or willing or both (i.e. ANNA, 
FRANCES, LULU, MARY, SARA and SUZANNE); while 


nice * willing 
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consists of all girls who are both nice and willing (i.e. the top left quadrant in the 


table, namely ANNA, MARY and SARA). Finally 


nice — willing 
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consists of the nice girls who are unwilling — i.e. FRANCES and SUZANNE, in the 
upper right quadrant. 

There is also a set membership operator IN which takes a value of the base type 
as left operand and a set as right operand and produces a BOOLEAN result — true 
if the value is in the set, false otherwise. Thus 


lulu in willing 
would be true, and 
jane in (nice + willing) 


would be false. This is very convenient for testing whether characters are in a set, 
for example 


ch in [‘a’,’e’,‘i’,’0’,'u',y’] 

or 
ch in [0’.. ’9’, ‘a’... 'z] 

which save time and space. For instance the latter example replaces 
((ch >= ‘0’) and (ch <= ‘9’)) or ((ch >= ’a’) and (ch <= ‘z’)) 


rather more succinctly. 

-Finally certain relational operators can be applied to sets. The operators = and 
<> test strict equality and inequality. The operators <= and >= test set inclusion. 
So it is easy to find out if one set is a subset of another. Thus 


[anna,sara] <= (willing * nice) 

because all members on the left are contained in the set on the right. Also 
[‘a’. . ‘2’] >= [‘a’,‘e’,‘i','0’,'u'] 

is true, though 

[‘a’..‘z'] >= ['a’,'b’'#', ‘$'] 

is false because ‘#’ and ‘$’ are not included in the letters. 
The priority of the set operators is as follows, from high to low. 


in = <> S 3 


Operators on the same line have equal precedence, and are evaluated left to right in 
expressions. So you should exercise care over bracketting. 


10.1.4 Setting an example 
Here is a little example where the logic of sets is used to assist the process of 
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deduction such as is found in the best detective stories. The listing is followed by 
a sample run. Together they should be self-explanatory. 


program sherlock; 
(* detective work *) 


type persons = 
(actress colonel,countess,baronet, 
drwatson,maid,minister,mp, 
moriarty narrator, sheepdog,stranger); 
location = (ballroom,bathroom,bedroom., cellars, 
gardens,hall,kitchen,library); 


var suspects, tempted able, away ,good : set of persons; 
murderer, body : persons; 
room : location; 


begin (process of elimination *) 

suspects := [actress . . stranger] ; 

(% no one is above suspicion *) 
writeln(‘so watson, you are baffled eh?’): 
writeln(‘well, you know my methods: let us apply them.’); 
writeln(‘first tell me who was murdered’): 
read(body); 
if body = drwatson then 

writeln(‘a dastardly deed! you have my sympathy.’); 
suspects := suspects — [body] ; 

(* suicide not suspected *) 
writeln(‘now tell me who wanted to kill ‘,body); 
read(tempted); 
suspects := suspects * tempted; 
write(‘where did the crime take place ?’): read(room); 
write(‘who had access to ’,room,’ at that time ? ‘): 
read(able); 
suspects := suspects * able: 
write(‘does anyone have a cast-iron alibi ? ‘); 
read(away); 
suspects := suspects — away; (could not have done it *) 
write(‘which ones are too honest to do such a foul deed ? '): 
read(good); 
suspects := suspects — good; (would not have done it :) 
writeln; 
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if suspects = [ ] then 
writeln(‘i fear you are not being candid with me.’) 
else 
if card(suspects) = 1 then 
begin 
murderer := actress; 
while not (murderer in suspects) and (murderer<stranger) 
do murderer := succ(murderer); (find the culprit *) 
writeln(‘the criminal is ‘,murderer); 
end 
else 
writeln(‘seek your villain among: ‘,suspects); 


writeln(‘when you have eliminated the impossible,’); 
writeln(‘whatever remains, however improbable,’); 
writeln(‘must be the truth.’); 


end. 


A sample run follows. User input is in lower case and computer output in 
upper case, to clarify the dialogue. 


NOW TELL ME WHO WANTED TO KILL DRWATSON 

[actress . . narrator] 

WHERE DID THE CRIME TAKE PLACE ? bathroom 

WHO HAD ACCESS TO BATHROOM AT THAT TIME ? 

[actress, countess, maid, narrator . . stranger] 

DOES ANYONE HAVE A CAST-IRON ALIBI ? [actress, narrator] 
WHICH ONES ARE TOO HONEST TO DO SUCH A FOUL DEED ? 
[sheepdog, maid] 


THE CRIMINAL 1S COUNTESS 

WHEN YOU HAVE ELIMINATED THE IMPOSSIBLE, 
WHATEVER REMAINS, HOWEVER IMPROBABLE, 

MUST BE THE TRUTH. 

SO WATSON, YOU ARE BAFFLED EH? 

WELL, YOU KNOW MY METHODS: LET US APPLY THEM. 
FIRST TELL ME WHO WAS MURDERED 

drwatson 

A DASTARDLY DEED! YOU HAVE MY SYMPATHY. 


In this light entertainment READ and WRITE are used with variables of 
enumerated type (BODY, ROOM and MURDERER) and with set variables. 
These facilities are extensions to standard Pascal, but most convenient. Set I/O 
allows sets to be read as they would appear in a program, except that the elements 
must all be constants. They are output in the same format. 
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The function CARD, used to evaluate CARD(SUSPECTS), yields the 
cardinality of a set; that is, the number of members in the set. 


10.2 The use of records 


A record, like an array, is a collection of data. It differs from the array in two main 
respects: the components of the record are selected by name (not by position) and 
those components do not all have to be of the same type. 

Records are used where several pieces of information related to a particular 
entity need to be grouped together into a unit. For example a student record 
might comprise 


forename 
surname 

date of birth 
course of study 


and other relevant details on an individual student; the catalogue record for each 
book in a library might contain 


author 

title 

date of publication 
publisher 

retail price 

shelf number 
subject category 
lending status 


and so on. 


10.2.1 Declaring records 


In Pascal the reserved word RECORD introduces a data structure with a number of 
components called ‘fields’. The name and type of each field must be specified. 
Each field name is a Pascal identifier chosen by the programmer. The record may 
have any number of fields. 

The record definition is closed by the reserved word END. For example 


type date = 

record 
d:1..31; 
m:1..12; 


year: 0..9999; 
end; (of date *) 
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student = 
record 
surname : packed array [1 ..20] of char; 
initial1, initial2 : char; 
birthday : date; (note nesting *) 
subjects : set of 
(math, stat, computer, physics); 
exammark : array [2..3,1..8] of O.. 100; 
(* exam results for 2 years in 8 courses *%) 
male : boolean; (sex discrimination *) 
feespaid : real; (% number of pounds received *) 
end; (of student record *) 


declares two kinds of record, DATEs and STUDENTs. A DATE contains three 
fields called D, M and YEAR. All three are subranges of the INTEGER type. A 
STUDENT contains eight fields of assorted types, one of which is itself a DATE. 

After defining these types, variables of type DATE or STUDENT can be 
declared. Thus 


var 
firstday doomsday : date; 
$1,s2 : student; 
college : array [1 .. 2500] of student; 


declares two DATEs, two STUDENTs and an array of 2500 STUDENTS called 
COLLEGE. 

Notice that arrays of records are possible and that a record may contain arrays, 
sets and other records as well as scalar fields. Files of records are also allowed, and 
indeed most useful, but records cannot have any fields which are files. 


10.2.2 Breaking records 
To select a component of a record in Pascal one writes a dot after the record 
identifier and follows it by the name of the field concerned. Thus 


firstday.d := 10; 
firstday.m := 3; 
firstday.year := 1981; 


assigns 10 March 1981 to the DATE variable FIRSTDAY. 
With record variables the type of item depends on the field selected; so 


$1 
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is of type STUDENT, while 
$1.birthday 
is of type DATE and 
$1.birthday.year 


is an integer between 0 and 9999. 

The dot is probably best pronounced as ‘’s’ (the possessive). Thus one says 
‘FIRSTDAY’s YEAR’ for FIRSTDAY.YEAR when reading a program aloud. 

A selected field may appear anywhere that a value of its type is permitted, not 
just in assignments: 


if s1.birthday.year < 1960 then 
writeln(‘mature student’); 


or 
writeln(s1.feespaid/2 :12:2); 


are examples. 
Assignment of complete records is quite acceptable, so that there is no need 
to say 


doomsday.d := firstday.d; 
doomsday.m := firstday.m; 
doomsday.year := firstday.year; 


when 
doomsday := firstday; 
will do. A record assignment copies the contents of all fields. 
10.2.3. The WITH statement 
To set up a complete student record is a laborious process. One might begin 


s2.surname := 'Forsyth 
s2.initial1 := ‘r’: 
s2.initial2 := ‘s’; 


and then baulk at the prospect of continuing — and this is not even a big record as 
such things go. 

Fortunately Pascal provides a less long-winded solution. When many fields 
of the same record are being processed the WITH statement 


WITH record-name DO statement 


allows the programmer to mention the record name once and then treat the field 
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names as variables, without repeating the record name. So 


with s2 do begin 
surname := ‘Forsyth : 
initiall :=‘r’s  initial2 := ‘s’; 
with birthdate do begin 
= 2; m:=10; year := 1948; 
end; (inner with *) 
subjects := [stat, computer] ; 
fori:=2to3do (%* 2nd and 3rd year *) 
forj:=1to8do (#8 course units *) 
exammark[i,j] := 0; 
feespaid := 0.0; 
male := true; 
end; (of outer with *) 


saves repeating S2 ten times and BIRTHDATE three. 

WITH establishes a context that applies to the statement following the DO 
within which components of a record may be referred to by field name only — 
without mentioning the record identifier, or the dot. 

The form WITH a,b DO s is shorthand for WITH a DO WITH b DO s. 


10.2.4 Is this a record? 


Here is a simple example, exhibiting most of the important features of records in 
use. 

We assume that a file has been created containing student records very similar 
to the type declared in Section 10.2.1. The program SELECTOR reads through this 
file and sets the category field to the grade determined by the average mark for 
every student. The grading scheme is as follows. 


VERYGOOD over 79 % 


GOOD 60 ..79 % 
PASS 40..59% 
FAIL under 40 % 


Having done so, it selects those students doing statistics or computing or both 
for further processing. For each selected student it prints the initials, surname, 
average mark and grade on a line of the standard output file OUTPUT. Additionally 
it accumulates separate success and failure counts for both sexes so that at the end 


it can print 


the total number of students selected, 
the number of male and female successes and failures among the selected 
students. 
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It also writes the graded records, of all students, to a new file NEWSTUDS. 


program selector (students ,newstuds,output); 


type cats = (fail,pass,good,verygood); 
date = 
recordd:1..31; m:1..12; year:0..9999 end; 
student = 
record 
surname : packed array [1.. 20] of char; 
initial1, initial2 : char; 
birthday : date; 
subjects : set of (math, stat,computer physics); 
mark : array [2..3,1..8] of O.. 100; 
( exam marks for 2 years in 8 courses *) 
male : boolean; 
category : cats; 
end; (of student definition *) 


var s : student; 
students, newstuds : file of student; 
i,j, tote,selected : integer; av: real; 
failures : array[boolean,boolean] of 0. . maxint; 
(% 1st subscript female/male, 2nd pass/fail +) 
b,c : boolean; 


begin (% main line #*) 
reset(students); rewrite(newstuds): 
selected := 0; 


for b := false to true do 
for c := false to true do _failures[b,c] :=0; 


writeln(‘fullname’:25,’ average’ ,‘class’:12); 


(* main i/o loop *) 
while not eof(students) do 
begin 
s:= students’; (current record *) 
if s.subjects * [stat,computer] <> [] then 
(% taking stats or computing or both *) 
with s do 
begin selected := selected + 1; 
(%* Compute total mark +) 
tote :=0 
for i := 2 to3do 
for j :=1to8do_ tote := tote + mark[i,j] ; 
av := tote / 16; 
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tote :=round(av); (+ average as whole number *) 
if tote >= 80 then category := verygood; 
else if tote >= 60 then category := good; 
else if tote >= 40 then category := pass; 
else category := fail; 
writeln(initial1 ,initial2:2, surname:22, 
av :8:2, category:12); 
c := (category = fail); 
failures[male,c] := failures[male,c] + 1; 
(* increment cell in fourfold table +) 
end; (of with *) 
newstuds% :=s; 
put(newstuds); get(students); 
end; (of while *) 


writeln; 

writeln(‘no. of students selected = ‘,selected:8); 
writeln(‘sex’:8,’fail’:8,’ok’:8); 
writeln(‘m’:8,failures[true,true] :8,failures[true,false] :8); 
writeln(‘f’:8,failures [false true] :8,failures[false,false] :8); 


end. 


10.3. Example program [BIGDEAL] 


This program deals Bridge hands. Then it scores them and prints them out in 
traditional North-West-East-South layout. Many interesting aspects of the use 
of sets and records crop up in the attempt to represent playing cards. 

But first, we must be able to shuffle the pack; and to do so we need a random 
number source. 

The digital computer, at least when functioning to specification, is a 
deterministic machine; therefore any work with pretensions to originality 
depends on a steady supply of random (or, more correctly, pseudo-random) 
numbers. 

Pascal lacks a built-in random number generator; so we put together a 
home-made random number function, based on the principle of Multiplicative 
Congruence — described by, among others, Gimpel (1976). This boils down, in 
Pascal terms, to an assignment 


rand := (rand *c +k) mod m; 


where a new random integer is obtained by multiplying the old one by a constant 
C, adding a second constant K and taking the modulus by a third M. The choice 
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of these ‘magic constants’ is crucial. For instance C and K must not share common 
divisors. Here are some triplets that work reasonably well. (They will all give 
integer overflow on small machines using less than 24 bits to store an integer: 

see Chapter 15 for a way of getting round this problem.) 


6 K M 
1061 3251 10000 
735 0 10657 
824 0 10657 
125 ] 16384 
4676 0 414971 
3141 110795 524288 
8705 0 532333 


Since every number is generated by applying a formula to the previous one the 
sequence is, of course, not truly random. However, by starting the sequence with a 
genuinely haphazard value, based on the system clock for instance, a suitably 
unpredictable sequence can be produced. 

Randomness is an elusive quality. It is customary to sidestep the interesting but 
time-consuming philosophical problems involved in its definition by accepting a 
series as random, for a given purpose, if it passes certain statistical tests aimed at 
uncovering hidden regularities. 

The function FATE (Section 10.3.1) has been tested by the author and (1) 
has a sufficiently even distribution, (2) lacks gross serial correlation of adjacent 
items. In terms of coin tossing, the likelihood of heads equals that of tails and the 
probability of a head or tail is unaffected by the outcome of the previous throw. 
(The latter condition is usually harder to satisfy.) 

The underlying sequence consists of integers between 0 and M — 1. FATE(N1,N2) 
scales these to yield integral values in the range given by the two input parameters, 
for convenience. Thus FATE(1,6) simulates a dice throw. 

Now we are ready to think about shuffling. Many card game simulations use 
methods of shuffling that are grossly inefficient or simply do not randomize the 
pack. The following algorithm (Green, 1963) is simple, quick and it works. 
Assume that the items are in DECK: ARRAY [1 ..52] OF CARD. 


1. Set C=1;N= 52. 

2. Choose a random integer J in the range C.. .N. 
3. Exchange DECK[C] with DECK[J]. 

4. Add 1 to C. 

5. If C<N return to step 2, otherwise finish. 


It entails only one pass through the array. (Any method based on assigning random 
values to each card and then sorting on those values must be worse.) 
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This is the basis for the procedure SHUFFLE in the program. Once we know how 
to shuffle, a top-level program outline emerges. 


repeat 
shuffle ; 
(% deal out the cards +) 
(% score the hands *) 
(% print them out *) 
until (* enough done *) ; 


We can now turn to data representation. 
A card is defined by its rank (2 .. 10, J, Q, K, A) and its suit (clubs, diamonds, 
hearts, spades). A record of the form 


type card= 
record 
cardrank : rank; 
cardsuit : suit; 
end; 


is indicated. But wait! What have we just done? We have made the standard 
function CARD (giving the cardinality of a set) inaccessible to our program, and we 
shall need it. Oh well, we can write one ourselves: it is called HOWMANY in the 
program. (The obvious solution is to re-name our type — KARD or CARDTYPE 

or some such — but I want to play cards not kards or cardtypes.) 

A player’s hand is a list of cards which starts empty and grows to 13 items. We 
could use a 13-element array and insert items as dealt; but then we would have to 
sort each hand for printout because we want the output to show, for each player, 
all the spades in descending rank, then the hearts in descending rank, and so on. 
The most convenient data structure turns out to be an array of sets. (Since we will 
score each hand the record for each player contains a field for its value, as 
determined by the conventional scoring system.) 


type 
cardset = set of rank; 
fistfull = (hand of one player +) 
record 
suitcase : array [suit] of cardset; 
( set of cards held in each suit *) 
eval: 0..99; (point score *) 
end; 


Note that we end up with a representation of cards in the hand different from that 
of cards in the deck. This is nothing to be surprised or worried about. 
Let us look at the program itself. 
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10.3.1 Program listing 


program bigdeal: 


type suit = (cl,di,he,sp); 
rank =2..14; (deuce. .ace x) 
players = (n,w,e,s); 
card = 
record 
cardrank : rank; 
cardsuit : suit; 
end; 
cardset = set of rank; 
fistfull = 
record 
suitcase : array [suit] of cardset; 
(* four sets of cards *) 
eval: 0.. 99; 
end; 
deck = array [1 ..52] of card; 


var 
carddeck : deck; 
hand : array [players] of fistfull; 
name : array [11.. 14] of char; 
(* for court cards — j,q,k,a *) 
nd,i,j,seed : integer; | 
(* seed used by random number function *) 
p : players; 
r:rank; this : suit; 


function fate (n1,n2 : integer) : integer; 


(random number generator: uses global seed *) 
const c= 8705; m= 532333; 


begin 
ifseed=QOthen (re-start series +) 
seed := clock mod m; 
(* based on time of day in msec. +) 
seed := (seed *c) mod m; 
fate := 
trunc(seed/m * (n2—n1+1) +1); 
(%* scaled to integer range n1 ..n2 +) 
end: (of fate *) 
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function howmany(cs : cardset) : integer; 
(% computes cardinality of cs #) 
var 
i: rank; nec: integer; 


begin 
ne :=0; 
fori := 2 to 14 do 
if iin cs then nc := ne + 1; 
howmany := nc; 
end; (of howmany +*) 


procedure evaluate(var h : fistfull); 
(% conventional scoring for bridge hand +) 
var nc,pval : integer; 
cr: rank; suitable : suit; 


begin 
pval := 0; 
for suitable := cl to sp do 
with h do 
begin 
nc := howmany(suitcase[suitable] ); 
(%* no. of cards in this suit *) 
ifnc =Othen  pval := pval + 3 
else if nc =1then  pval := pval + 2; 
(* points for void or singleton +) 


forcr:=11to14do (court cards *) 

if (cr in suitcase [suitable] ) 
and (nc > (14—cr)) then 
pval := pval + (cr—10); 

(* 1 for j if nc>3, 2 for q if nc>2, 
3 for k if nc>1, 4 for a *) 

(if you can do it more tidily, 
let me know +) 

end; 
h.eval := pval; 
end; (of evaluate *) 


procedure outline(Imargin : integer; thislot : cardset; 
suitname : suit); 
(% prints out suit from one hand +) 
var i,gaps : integer; 
r: rank; 
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begin 
for i := 1 to Imargin do write(’ '); 
(% leading spaces *) 
gaps := 12; 


( n.b. no room for 13 in one suit! *) 


write(suitname:2); 
for r := 14 downto 2 do 
if rin thislot then 
begin gaps := gaps — 1; 
if r > 10 then write(name[r] :3)) 
else write(r:3); | (% court cards are named +) 
end; 
for i := 1 to gaps do write (‘ '); 
(% trailing spaces *) 
end; («of outline *) 


procedure shuffle(var pack : deck; i,n : integer); 
(* shuffles deck from i to n *) 
var j : integer; temp: card; 


begin 
while i<ndo 

begin 
j := fate(i,n); 
( interchange card i with card j +) 
temp := pack [i] ; 
pack [i] := pack[j] ; 
pack [j] := temp; 
b:=i+1; 

end; 


end; (of shuffler *) 


begin (main line +) 
seed := 0; 
(% initialize the carddeck *) 
r:=2; this :=cl; (start with 2 of clubs +) 
fori :=1to52do 
begin 
carddeck [i] .cardrank :=r; 
carddeck [i] .cardsuit := this; 


if r= 14 then 
begin 
r:=2; 


if this < sp then this := succ(this); 
end 
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elser:=r+1; (next card *) 
end; (only needs to be done once *) 


name[11] := ‘j'; name[12] := ‘q’; 
name [13] := ‘k’; name [14] := ‘a’; 
( names for court cards #) 


write (‘how many deals ? ’); read(nd); 


j := 0; 

(* main loop #) 

repeat j :=j+1; 
writeln(‘the new deal... . number’ j:7); 
writeln; 


shuffle(carddeck,1,52); 
(% also empty players’ hands +) 
for p :=ntosdo 
with hand[p] do 
for this := cl to sp do suitcase[this] := [ ]; 


(% the new deal *) 
Pp :=S, 
for | := 1 to 52 do 
begin 
if p=sthenp:=n (% north +) 
else p :=succ(p); (* next player *) 
r := carddeck [i] .cardrank; 
this := carddeck [i] .cardsuit; 
hand [p] .suitcase[this] := 
hand [p] .suitcase[this] + [r] ; 
( add to p’s hand in this suit *) 
end; 


(* score the hands *) 
forp:=ntosdo_ evaluate(hand[p] ); 


(* display results *) 
write(‘north’:25); 
with hand[n] do 


SETS AND RECORDS 


begin 
writeln(‘points = ‘:16, eval:4); 
writeln; 
for this := sp downto cl do 
begin 
outline(20,suitcase [this] ,this); 
writeln; 
end; 
writeln: 


end: (of with *) 


(% now west and east *) 
write(‘west’,‘points =':16,hand[w] .eval:4,’ ‘); 
writeln(‘east’:16,‘points =’:16,hand [e] .eval:4); 
writeln; 
for this := sp downto cl do 
begin 
outline(O,hand[w] .suitcase[this] ,this); 
outline(O,hand[e] .suitcase[this] ,this); 
writeln; 
end; 
writeln; 


( south as for north *) 
writeln(‘south’:25,‘points =’:16,hand[s] .eval:4); 
writeln; 
with hand[s] do 
for this .= sp downto cl do 
begin 
outline(20,suitcase [this] ,this); 
writeln; 
end; 
writeln; writeln; 


until | >= nd; 


end. 
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10.3.2 Sample run 
Here is a sample run of the program on the DEC System-10. 
HOW MANY DEALS? 2 


THE NEW DEAL .... NUMBER 1 


NORTH POINTS = 11 

sP K 10 9 8 6 

HE 8 

DI A 10 8 6 

CL Q10 7 
WEST POINTS = 8 EAST POINTS = 11 
sP Q 3 sP A742 
HE KQJ97 63 HE 4 2 
DI J 54 DI K 9 2 
CL 9 CL A863 

SOUTH — POINTS = 10 

sP J 5 

HE A 10 5 

DI Q 73 

CLK J 542 


THE NEW DEAL .... NUMBER 2 


NORTH POINTS = 15 
SP 10 2 
HE A OQ 4 
Di A QO 4 
CL K 9 8 4 3 
WEST POINTS = 5 EAST POINTS = 10 
SP K 8 5 4 3 SP Q 9 6 
HE 10 5 HE K 8 2 
Di 10 5 DI J 73 2 
CL Q 7 6 2 CL A 10 5 


SP A J 7 
HE J 9 7 
DI K 9 8 
CL J 
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10.4 Exercises 


1. In Section 10.2.4 an example program was shown that read through a file called 
STUDENTS. Write a program that will read student particulars in a suitable format 
from a TEXT file (created using an editor) and write them into the file of STUDENT 
records read by SELECTOR. 

At the same time think about the problems of updating such a file. It is easy 
enough to create or read it; but revising the contents of a sequential file involves 
reading through it and writing the revised information out onto a new file — which 
then replaces the old one. 

What happens typically is that the amendments are grouped into a batch and 
stored on a ‘transaction file’. When vhis is ready the old ‘master file’ and the 
transaction file are read together and the collated data put on a new master file. 
This is only practicable if both transaction and master file are ordered on the same 
key (see Fig. 10.1). 


ransaction 
ile 


New 
master file 


Collate/ 
merge 


(Carried forward) 


| 
| 
(Brought forward) | 
| 
| 
| 


Figure 10.1 Master file update 


2. Typhoid Tours Ltd, the package-holiday subsidiary of Hijack International 
Airlines, holds information about its past customers on a sequential file of records. 
The record layout is as follows: 


customer name 32 characters 
customer address record 
year of last trip number 


year of last-but-one trip number 


The customer address is itself a record, as follows. 


road 20 characters 
town 20 characters 
district 16 characters 


postcode 8 characters 
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At the start of every year they go through the file and remove any customers 
who have not travelled for five years. This is done by copying all records from the 
old file, except those who have not travelled during the last five years, into a new 
file, which is then kept to replace the old one. You are to write this program. 

However, before this ‘purge’ takes place, another program reads the file to print 
a name-and-address list of all clients likely to be deleted unless they have another 
trip soon. They are sent promotional literature with a special offer to induce them 
to travel again. 

The customers selected to receive these brochures are those who have travelled 
once, but not twice, in the last five years, and whose latest trip was not in the 
current or preceding year. 

Write the program to scan through this file, select the customers who meet 
these criteria (and only those customers) and print nice neat address labels for 
them. 

The sticky labels come on a continuous stationery roll: they start in column 8 
and are 40 columns wide. They are 12 lines deep with a 4-line gap between them. 

Include provision in the program to obtain the year of the current run from the 
user terminal at execution time. 


11 


Advanced topics 


You have now almost finished learning the Pascal language: it is time to learn how 
to use it. 

This is the point at which most textbooks bid you a fond farewell; but for us the 
fun is just beginning. 


11.1. Passing functions as parameters 


It is possible for the parameters of a procedure or function to be procedures or 
functions themselves. Thus 


procedure q (a,b : integer; procedure p; function r : real); 


introduces a procedure called Q with four parameters. The first two are quite 
ordinary: they are integers. The third is a procedure named P, and the fourth is a 
function R with a REAL result. This means that Q can be supplied with a different 
procedure and function to work on each time it is called. 

What is the use of that? Well, in numerical integration a general integration 
procedure can be applied to a variety of functions. On any one occasion the 
function being integrated can be passed as a parameter. The program below 
demonstrates this. 

The function SIMP uses Simpson’s Rule to estimate the area under a curve 
between two points A and B. The function defining the curve is F. 

In order to calculate an area with Simpson’s Rule: first divide the interval to be 
measured into an even number of equal-sized slices of width H at an odd number of 
points; then sum the values of the function at both ends (Y), the values at all the 
odd points (S1) and at the even points (S2). (Points are numbered from 1 at the 
low end.) The approximate area is then given by the.formula 


(h/3) *(y + 2%s1 + 4 + 52) 


which gains accuracy as the interval is divided into thinner and thinner slices, until 
the limit of floating-point precision is neared. The function keeps doubling the 
number of slices, and halving their width, until two successive estimates (V1 and 
V2) differ by less than EPSILON, an input parameter. 

The function integrated, NORM, gives the height of a normal curve with a mean 
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of zero and SD of one (the Gaussian distribution) at any point Z, where Z is defined 
as the standard deviation. It is one of the most important probability distributions 
in statistics. 

The clever thing about SIMP is that as it doubles the number of strips it avoids 
the need to re-calculate the F values already calculated: only the even-numbered 
points are new values. (Think about it.) 


program simpson; 


var 
iz : integer; 
area,p,z : real; 


function norm(z : real) : real; 
(% standard normal distribution +) 
const p2 = 6.2831853; (#2 times pi *) 


begin 
norm := 1 /sqrt(p2) * exp(—(z*z/2)); 
end; 


function simp(a,b,epsilon : real; function f(real):real) : real; 
(* numerical integration using simpson’s rule *) 
var i,n : integer; 
h,s1,s2,v1,v2,x : real; 


begin 
h := (b—a) / 2; 
$1 := f(a) — f(b); 
s2 :=f(at+h); 
v1 := hx(s1 + 4:52) / 3; 
n:=2; 
repeat 


if n>2 then v1 :=v2:; (old estimate *) 
$1 :=s1 + 2452: 
s2:=0.0; x:=ath/2: 


for i :=1tondo 
begin 
s2 := $2 + f(x); 
Xx :=ht+x; 
end; 


h:=h/2; n:=2n; 
(% twice as many slices of half the thickness *) 
v2 := h(s1 + 4.0%52) / 3; 
until abs(v1 — v2) < epsilon; 
simp := v2; 
end; 
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(* iterates till 2 successive estimates differ by less than epsilon +) 
(* avoids re-calculating points already used in preceding cycle *) 


begin (* main line) 
writeln(‘integration of normal curve:’); 
writeln; 
writeln(’z’:8,‘height’:8,‘area’:12); 
writeln; 
for iz :=0 to 40 do 
begin 
z :=iz/ 10; 
p:=norm(z); (* height at that point +) 
area := simp(0.0,z,0.0002, norm); 
(* area up to that point *) 
writeIn(z:8:2, p:8:2, area:12:4): 
end; 


writeln; 
end. 


In the procedure heading of SIMP the parameter type of F has been specified in 
brackets (REAL). This is a DEC System-10 convention, not standard Pascal: it 
enables the compiler to check that the right number of parameters of the correct 
type are given to functions or procedures that are themselves parameters. 

A tun of this program produces a table of areas under the normal curve, of which 
the first ten entries are as follows. 


INTEGRATION OF NORMAL CURVE: 
Z HEIGHT AREA 


0.00 0.40 0.0000 
0.10 0.40 0.0397 
0.20 0.39 0.0792 
0.30 0.38 0.1178 
0.40 0.37 0.1552 
0.50 0.35 0.1913 
0.60 0.33 0.2256 
0.70 0.31 0.2579 
0.80 0.29 0.2880 
0.90 0.27 0.3158 
1.00 0.24 0.3412 


(* NEXT 30 LINES OMITTED FOR BREVITY *) 


As written, the program SIMPSON is very rigid. It always uses the function 
NORM, giving a cumulative probability distribution. Since this function cannot be 
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integrated analytically, the program has its uses; but the function SIMP is not tied 
to NORM, it could integrate a variety of other functions without being rewritten — 
as long as they all take a single REAL argument. 

SIMP therefore is a utility routine, and a good candidate for inclusion in a 
subroutine library — a collection of pre-written procedures and functions that can 
be used in more than one program. Pascal provides the facility of access to 
procedures or functions compiled separately from the main program. This enables 
the Pascal programmer to take advantage of subprogram libraries. The declaration 
of an external procedure or function in the Pascal program that uses it consists 
only of a procedure/function heading followed by the word EXTERN (compiled 
Pascal) or by FORTRAN (for Fortran language routines). 

The details of how a separately compiled procedure or function is linked into a 
Pascal program by operating-system commands would take us outside the scope of 
this book since each host system has its own method. It is usually not difficult, 
however. 


11.2 Variant records 


The syntax of record definition is more complex than I dared admit in Chapter 10 
(see Fig. 11.1). These rules provide for variant records. 


Record 
RECORD field list ( en ) 
Field list 


O-©+=}-+C 


Figure 11.1 Record definition 
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Quite often two different sorts of record are very nearly the same, but not 
identical. For instance a married person’s record might differ slightly from a single 
person’s though they shared many fields. In this situation it is unnecessary to define 
two separate record types: one record can encompass both the common fields and 
the differences. 

In Pascal the fixed part of the record comes before the variant part (see above). 
A record may have only one variant part, but variant parts may be nested, so 
unlimited complexity is possible (though not highly recommended). 

Let us take a concrete — or perhaps brick — example. An estate agent maintains 
a file of properties. Most details of each property are the same regardless of whether 
it is for sale or rental; but if it is for sale the agent wants to store the price asked 
and the length of time it has been on the books, whereas if it is to be rented he 
wants to store the monthly rent and the name of the landlord. 

Suitable type declarations appear below. 


program dwelling; 


type 

kind = (flat, maisonet,cottage,terraced, 
semi,bungalow,detached,mansion); 

forsale = (rent,sale); 

quality = (freehold,centheat,mainroad,loft, 
neartube,nearrail,balcony,insidewc); 
(* attributes of the dwelling *) 

string24 = packed array [1. .24] of char; 


property = 
record 
sort : kind; 
features : set of quality; 
area : string24; 
address : packed array [1. .80] of char: 
gardens, bedrooms : 0. .99; 
landacre : real; 
(* variant part %*) 
case salerent : forsale of 
sale: 
(cost : integer; (asking price in pounds *) 
waittime : integer); 
rent: 
(monthly : integer; (rent in pounds *) 
landlord : string24); 
end; (of property definition *) 
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Here the field SALERENT is the tag field. The tag field is present as a field in its 
own right, but it also serves to discriminate between the two related sorts of data. 
Given the foregoing declarations one might declare variables such as 


var 
minbeds,mingards,maxprice : integer; 
desired, unwanted : set of quality; 
proptype : set of kind; 
estates : file of property; 
(* main real-estate data-file *) 


and then go on to code the search process for a desirable residence as follows. 


(* get desired qualities, 
unwanted qualities, 
min. no. of bedrooms, gardens 
maximum price and 
house type/types from client *) 


reset (estates); 
(* search for des. res. *) 
while not eof(estates) do 
with estates“ do begin 
if (salerent = sale) 
and (bedrooms >= minbeds) 
and (gardens >= mingards) 
and (unwanted * features = []) 
and (desired <= features) 
and (sort in proptype) 
and (cost <= maxprice) 
then 
(% print full details +); 


get(estates); 
end; (with and while *) 


We have the kernel of a useful system here. The customer can make quite 
sophisticated requests. For example, we could ask for all freehold terraced or 
semidetached houses with central heating near a railway station but off a main road 
that have a garden and at least three bedrooms and cost no more than £40 000 — 
if we frame our question correctly. 

Incidentally, the CASE statement is tailor-made for dealing with variant records, 
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as Suggested by the use of the word CASE in the record definition. For instance, if 
P is a property 


case p.salerent of 
sale: 
(% print cost & wait time *); 
rent: 
(% print landlord’s name and rent per month %); 
end; 


is a skeleton for dealing with the two alternatives. 


11.3. Dynamic data structures 


Till now we have only dealt with variables that are declared at the head of a program 
or subprogram and remain in existence until it is finished. Such structures are static. 
But for some problems static storage is inadequate: we require structures that 
can grow or shrink dynamically. In Pascal this entails the use of pointers. 

We can declare a pointer type as follows 


type basetype = (* basic data *) ; 
signpost = “basetype; 


whereby variables of type SIGNPOST may be used to refer, indirectly, to data of 
type BASETYPE. Then if S is a variable of type SIGNPOST its value will be an 
address, and the contents of that store address will be a value of the BASETYPE. 
Pascal distinguishes S (the pointer) from S* (the object pointed at). 

Pointers really come into their own in connection with flexible data structures 
of unpredictable size such as linked lists. Consider the problem of reversing the 
order of a sequence of characters of arbitrary length, terminated by a fullstop. The 
program below solves it, using a linked list. 


program linklist; 
(%* reverses a character sequence of unknown length +) 
const stopcode = ’.’; 


type 
link = “cell; (links point to cells *) 
cell = record 
head: char; (data *) 
tail: link; (* pointer to next cell +) 
end; (of cell *) 


var this, last : link; 
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begin 
this := nil; (special null pointer +) 
while input* <> stopcode do (#get next ch and store *) 
begin 
last := this; (* previous item *) 
new(this); (create a cell *) 
this“ .head := input’; 
this“ .tail := last; 
(%* latest item stuck on front *) 
get(input); (%* move along to next ch *) 
end; (input loop *) 


(% output starts at end and works backwards *) 
while this < > nil do 
begin write(this” head); 
this :=this’.tail; (%* next one *) 
end; 
writeln; 


end. 


Several points are worth noting. 

In the first place NEW(P) is used to allocate dynamic storage. P must be a pointer 
variable: if P is of type “T then enough storage for one datum of type T is allocated 
and P is set to its store address. Thus P is a VAR parameter. There is also a function 
DISPOSE(P) which frees the storage that P points to. (Warning: DISPOSE(P) may 
de-allocate all storage cells allocated later than P as well! See Section 11.4.) 

Secondly, there is a special word NIL which points nowhere. It is a constant 
belonging to all pointer types. 

Thirdly, there is a great difference between P and P“. P has a value that is an 
address; P’ has a value that is the contents of store at that address. Fig. 11.2 
illustrates this. Thus when we say LAST := THIS (both being pointer variables) 
LAST and THIS now point to the same thing; but if we had said LAST’ := THIS! 
then the coritents of the location THIS points to would be copied into the location 
LAST points to. Fig. 11.3 may help elucidate the difference. 

Finally, it is permissible to declare the type LINK as a pointer to items of the 
type CELL before CELL itself has been declared. This is the only case in Pascal, 
apart from FORWARD procedures (Section 7.5), of an identifier being used in 
advance of its declaration. 


p pA 


io 


Figure 11.2 A pointer variable 
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Before After 


st}, © 


oC) 
THIS (o) 


LAST: 


LAST := THIS 


THIS} THIS 


LAST 


ba 


LAST” := THISA 
THIS 


Lt 
© 


—_ 


Figure 11.3 Pointer assignments 


To sum up, pointers are tools for the construction of complicated and flexible 
data structures where the organization of information is important as well as its 
value and where the organization is not fixed for the duration of the run. 


11.4 Example program [CHOPPER] 


Our example deals with Boolean expressions. The program does not evaluate them: 
it simplifies them by symbolic manipulation. 

The expressions are held as binary ‘trees’ built up using pointers in a very simple 
way (see Fig. 11.4). 

The binary tree is one of the classic data structures of computer science. It is 
defined as root which is a node with two branches which may be empty or are 
themselves subtrees. If both branches are empty then the node is a ‘leaf’. Binary 
trees can be used to represent arithmetic or logical expressions, pedigrees, symbol 
tables, states of a game, the typology of dinosaurs, the execution of a search 
procedure and much else besides. 


Expression Tree 


(‘A +(B. 1)) 


Figure 11.4 A binary tree 
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Trees, and other dynamic structures, should not be regarded as exotic tropical 
plants that flourish only in the cloistered gardens of academia: they have simple, 
practical, money-making applications. The competent Pascal programmer should 
not fear to pluck fruit from the tree of knowledge (binary or otherwise)! This 
example is intended to convince you of that. 

Many millions, even billions, of dollars are invested annually in the design of 
integrated circuits — which implement logic functions on a silicon chip. Computer 
programs are extensively used in the design process. Some are CAD packages 
involving high-resolution graphics, but many rely on the rules of Boolean algebra. 

To improve a tentative design: 


take the circuit diagram of the proposed device; 
express it as one or more Boolean expressions; 
apply the simplification rules; 

translate back into a circuit diagram. 


In theory you now have a blueprint for a new, smaller, cheaper, circuit. 

In practice the simplification procedure is more complex than we have room for; 
but you can gain an insight into the sort of techniques used from this program which 
uses only the fundamental laws of logic. 


(1.a)=>a (1 +a) = 1 
(0.a)=>0 (0 +a) => a 
(a.a)=>a (a +a) => a 

0 => 1 

1 => 0 

"a =a 


Here + stands for OR, . represents AND and’ is NOT. 


program chopper (input,output); 
(* simplifies boolean expressions held as trees *) 


const lbra = ’(’; rbra =’)’; 
notsign = '""; (* apostrophe *) 


a 


andsign = ’.’; orsymbol = ‘+’; (+ logical operators +) 


type link = “node; 
node =record (%* fundamental tree-building element +) 
head,tail : link; (left and right branches +) 
body : char; (central item *) 
end; 


var tree,bush,twig : link; 


function makenode (x : char) : link; 
(%* creates simple node with no subtrees, a ‘leaf’ *) 
var n: link; 
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begin 
new(n); 
nA head := nil: n.tail := nil; 
n4 body := x; 


makenode :=n; 
end; (#of make node *) 


function maketree(Ih : link; x : char; rh : link) : link; 
(* creates tree with one or two branches *) 
var t : link; 


begin new(t); (allot storage space *) 


t4. head := 1h; ° 
t.tail := rh; 
t*. body := x; 


maketree :=t: 
end; (of make tree #) 


function readatom(var f : text) : link; 
(%* reads a single-character item *) 


begin 
while f4 =’ ‘doget(f); (skip blanks *) 
readatom := makenode(f‘); (# the datum +) 
get(f); (% pass over the character *) 

end; (of read atom *) 


function readtree(var f : text) : link; 
(% reads in and stores a fully bracketted boolean expression *) 
var t: link; firstchr : char: 


begin 
while not (f4 in [Ibra,notsign]) do  get(f); 
(* skip up to 1st significant character *) 
firstchr := fA”; 
get(f); new(t); 
while f* =’ ’do_ get(f); (skip over blanks *) 


if firstchr = notsign then 
begin (only one operand *) 
t4 head := nil; t4.body := notsign; 
if (f* =Ibra) or (f4 = notsign) then 
t4.tail := readtree(f) (+ subtree +) 
else t.tail := readatom(f); (simple item +) 
end 
else 
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begin (binary operation *) 
if f4 in [Ilbra,notsign] then t“.head := readtree(f) 
else 
t A.head := readatom(f); (# left operand *) 
while f4 =’ ‘do get(f); 
t’.body :=f4; (* operator *) 
get(f); whilef4 =’ ‘do get(f); 
if f4 in [lbra,notsign] then 
t.tail := readtree(f) 
else t’.tail := readatom(f); (* right operand *) 
while f4 <>rbrado_— get(f); 
get(f); (#*skip past closing bracket *) 
end; 
readtree :=t; (final result, a pointer *) 
end; (*of read tree *) 


function atom(t : link) : boolean; 
(% tests if t is elementary or not *) 


begin 

ift=nilthen atom := true 

else atom := (t4.head = nil) and (t4.tail = nil); 
end; (of atom function *) 


procedure printree(var f : text; e : link); 
(%* prints out a tree-structure on file f, 
as a fully bracketted boolean expression *) 


begin 
if e = nil then 
write(f,’()’) 
else 
if atom(e) then write(f,e* body) 
else 
if e4.body = notsign then (%* no need for brackets *) 
begin 


write(f,notsign); 
printree(f,e.tail); 
end 
else 
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begin 
write(f,ibra); 
ife4 head <> nilthen printree(f,e“.head); 
write(f,e’ .body); 
ife”.tail<>nilthen printree(f,e“.tail); 
write(f,rbra); 

end; (takes care of atoms & trees *) 

end; (of print tree *) 


function same(t1,t2 : link) : boolean: 
(* tests whether t1 and t2 are equivalent *) 


begin 
ifti=nilthen same :=t2 =nil 
else if t2 =nilthen same := false 
else if atom(t1) and atom(t2) then 
same := t14.body = t24.body 
else same := (t14.body = t24.body) and 
(same(t1“.head,t2“.head) and same(t1 “.tail,t2 4 .tail) 
or 
same(t14.head,t24.tail) and same(t1.tail,t2.head)): 
(* allows that a+b = b+a etc. *) 
end; (#of equivalence test +) 


function tidy(e : link) : link; 
(* simplifies a boolean expression *) 
var t,lh,rh : link; 
begin 
if atom(e) then 
t:=e (* cannot simplify further *) 
else 
begin 
lh := tidy(e*.head); rh := tidy(e“.tail): 
ife’ body =notsign then (%*not *) 
if rh4.body = notsign then 


t :=rh“.tail 
(* eliminates double negatives +) 
else 


if rh“ body = ’0’ then 
t := makenode(’1’) 
else 
if rh“. body = '1' then := makenode(’0’) 
else t := maketree(nil,notsign,rh) 
else if e .body = andsign then 
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begin (logical and *) 
if (Ih“.body = ’0’) or (rh4.body = ’0’) then 
t := makenode(’0’) 
else if Ih“.body = ‘1’ then 


t:=rh 

else if rh4.body = ‘1’ then 
t:=lh 

else if same(rh,th) then 
t:=Ih 

else t := maketree(Ih,andsign,rh); 


end 
else if e.body = orsymbol then 
begin (logical or *) 
if |h“.body = ’0’ then 


t:=rh 
else if rh4.body = ‘0’ then 
t:=lh 


else if (Ih“.body = '1’) or (rh4.body = ‘1’) then 
t := makenode('1’) 
else if same(lh,rh) then 


t:=th 
else t := maketree(lh,orsymbol,rh); 
end 
else t :=e; 
end; 
tidy :=t; 


end; (of simplification function *) 


begin (% main line *) 
twig := makenode(’0’); 


repeat 
writeln; 
writeln(‘give your boolean expression please:’); 
tree := readtree(input); writeln; 
write(’expression : ’); 
printree(output,tree); writeln; 
write(’ ='); 
bush := tidy(tree); (* simplified version *) 
printree(output,bush); writeln; 

until same (bush,twig); 


writeln; 
end. 
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This program is not powerful enough to assist in the analysis and design of 
realistically complex logic circuits, but it is a step in the right direction. It would be 
relatively easy to teach it that (A + 'A) => 1 and that (A .'A) => 0, and it would 
not require a major overhaul to add De Morgan’s laws and the distributive laws (left 
as an exercise for the reader!). To be really useful though it ought to allow 
interlinked definitions — i.e. given that A = one expression and that B = another and 
C = a third it should simplify each separately, then try to insert the simplified 
expressions where the terms A, B and C appear in the three definitions, to see if 
greater simplicity could be obtained. 

Nevertheless it can come up even as it is with some impressive simplifications on 
occasion, as the following example run on the DEC System-10 shows. 


GIVE YOUR BOOLEAN EXPRESSION PLEASE: 
((A.1) + °('1+A)) 


EXPRESSION : ((A.1)+'('1+A)) 
= (A+’A) 


GIVE YOUR BOOLEAN EXPRESSION PLEASE: 
((A.B) + (B.(’0.A))) 


EXPRESSION : ((A.B)+(“B.(’0.A))) 
= (A.B) 


GIVE YOUR BOOLEAN EXPRESSION PLEASE: 
mt? 


EXPRESSION : "2 
='7 


GIVE YOUR BOOLEAN EXPRESSION PLEASE: 
((((A.A) + A). 1) +0) 


EXPRESSION : ((((A.A)+A).1)-+0) 
=A 


GIVE YOUR BOOLEAN EXPRESSION PLEASE: 
((A.1) . "(1+A)) 


EXPRESSION : ((A.1).’(1+A)) 
=0 


What CHOPPER demonstrates is that with pointers and dynamic linked structures 
we can use the computer for symbolic manipulations, not just calculations. 

The difference between manipulating an expression and merely evaluating it is 
a profound one: it is the difference between algebra and arithmetic. The possibility 
of rearranging symbolic structures opens the way into the field of theorem-proving 
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in particular and non-numerical mathematics in general. It reminds us strongly that 
the digital computer is a general-purpose symbol-manipulator, not simply an 
overgrown calculator. 

One other aspect of CHOPPER that would repay further study is its reliance on 
recursion. READTREE and TIDY especially use recursion heavily. The reason for 
this is that the binary tree is a recursively defined entity, in that the branches of a 
tree are, if they exist, themselves trees. With recursively defined data structures the 
natural mode of processing is recursion. Indeed it would be very hard to re-code 
CHOPPER to exclude all recursive calls, and would obscure its purpose. 

Two weaknesses in the program are worth mentioning. Firstly it is extravagant 
with storage: it keeps using NEW to make space for new structures but never invokes 
DISPOSE. Storage that is no longer wanted can be re-cycled by means of a list of 
available cells (avoiding the pitfalls of DISPOSE which is often absent or 
implemented in a way that makes it unusable). The process of re-using unwanted 
store is known as ‘garbage collection’ (see also Chapter 12). 

The second weakness is that the input expressions have to be fully bracketted. 
CHOPPER knows nothing about the precedence of AND over OR, so that 


a.bt+c 
must be written as 
((a. b) +c) 


which can be irksome for the user. There are a number of algorithms for reading in 
partly bracketted expressions and forming trees (Gries, 1971) and you might care to 
ponder the work involved in upgrading READTREE to relieve the user of this 
chore, since there are no set exercises for this chapter. 


12 


Programming 
practice 


Learning the rules of Pascal, or any other programming language, does not 
automatically confer mastery of the art of computer programming. Programs may 
be well or badly written. Well-written programs are short, elegant, legible, and they 
work. Badly written programs are long, ugly, untidy and do not work. (Frequently 
the author claims that they are ‘almost working’, which may be true, but a program 
that only does what it should 80% of the time is 100% useless.) 

In this chapter you will be given some guidelines on constructing well-written 
software, which are collected together under the banner of ‘structured 
programming’. Structured programming is not a religion — even if some of its 
adherents treat it as one — it is simply a set of disciplines which help in the program 
design process. These disciplines include: choosing only the well-behaved control 
constructs of sequence, selection, repetition and embedding (see Chapter 6); making 
the program reflect the structure of the data; and adopting a consistent method of 
program design and development. 

The key word is design: program design is where things go right or wrong; 
software writing is a minor concern. If the design is flawed the code will be a mess; 
if the design is correct the coding will take care of itself. 

It is important in all this not to lose sight of the objectives of structured 
programming, which are: 


to produce reliable software which works under a wide range of conditions (and 
does not ‘crash’ at the first erroneous input); 

to produce it quickly; 

to ensure it can be maintained, i.e. amended and improved later, possibly by 
another programmer; 

and to make it efficient. 


Efficiency is not a matter of saving milliseconds of runtime or a few bytes of 
memory. In this day and age we must think first of economizing human resources, 
not the machine’s. If that means making the computer work harder, so be it: that is 
what it is for. 
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12.1. Program design 


The transition from a small problem, which is simply an exercise in coding, to a 
large problem, involving all the skills of the programmer’s art, introduces several 
new considerations. Firstly, a large program must be organized into parts: these 
subdivisions are not given, but must be chosen by the programmer. Secondly, a 
scheme for representing the data is not always obvious, but must be devised. 
Finally, a big system will evolve: parts of it will require modification long after its 
first creation while other parts retain their original form. Modifications are 
requested because understanding of what the system should really do and how it 
might best do it increases gradually, and because, with experience, users find they 
want additional functions performed. Thus the initial organization must be chosen 
with an eye to the unknown future. 

On top of this is the problem of defining the task. Many tasks that are extremely 
well defined pose very difficult design problems; but in the real world programmers 
are often confronted with problems that are messy and ill-specified. The job of 
specifying just what is entailed is usually referred to as ‘systems analysis’, as distinct 
from programming; but programmers may need to turn their hands to it on 
occasion. Even if you are your own user the process of turning your vague ideas 
into the plan for a program can be a long and arduous one. 


12.1.1 Program subdivision 


With current programming languages, including Pascal, any large program will be 
organized as a hierarchy of subprograms. A very large system may be composed of 
several complete programs. The programs in such a suite, or the routines in a large 
program, are known as ‘modules’. The alternatives to hierarchic modular structure 
are not well understood, and are not further considered here. 

One strategy for arriving at a hierarchical program structure, often called the 
‘top-down’ approach, is to split each major process into a small number of 
subdivisions. Each of these is given a name and its function — the processing it 
accomplishes — is defined precisely by stating exactly what inputs it requires and 
what outputs it delivers. How the subprocess actually carries out its job does not 
matter at this stage. 

When all its subprocesses have been defined in this way the main process is 
considered defined, and it should be possible to write the code for it. It will consist 
almost entirely of calls to subprograms. 

Once any module is coded, attention can be focused on each of its subprocesses 
in turn, using exactly the same strategy of decomposition. Ultimately subprocesses 
are reached that are trivial, so the decomposition halts. Although apparently at 
every stage all the complexities are relegated to lower levels, it will be found at the 
last that nothing complicated remains to be done. That, at least, is the theory! 

This top-down approach is called ‘stepwise refinement’ by Wirth, the inventor of 
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Pascal. It provides a framework within which the programmer can concentrate on 
developing a hierarchy of procedures that are at once compact, efficient and easy 
to understand. The size of each module should be kept small, so that the 
instructions constitute an immediately comprehensible outline of the action it 
carries out. 


12.1.2 Modularity 


One of the most important general principles for designing program hierarchies is 
centralization of function. Each function that is required should be performed by a 
single subprogram. The advantages of doing this are many. Most obviously, space is 
saved by not duplicating the same segment of program in various places; likewise, 
programming time is saved by not writing essentially the same sequence of 
statements more than once. The most important gain, however, is flexibility. Almost 
always, proposed program changes are described as amendments to existing 
functions. If the functions are centralized, modification is usually quite easy; if 

not, it may be completely impracticable, requiring multiple changes throughout 

the program that amount to a major overhaul. 

A final advantage of centralization is that it improves the intelligibility of 
the program. This clarification comes about mainly because this approach accords 
with the way people come to grips with complex systems — in terms of the 
functions they perform. A methodical build-up from a set of small well-defined 
modules each of which has a clearly specified role to play in the overall operation is 
the most natural way of preventing complexity leading to anarchy. 

Working through the program from the top down is an excellent way of 
discovering whether the same operation recurs in several places. If the several uses 
of the operation are not identical it is still possible in most cases to write a single 
subprogram, with parameters, that performs all the variant tasks. 

Each module should only do one job. This permits it to be used in several places. 
Clearly a subprogram carrying out many different functions is likely to be over- 
specialized and unlikely to be generally useful. 

Nevertheless the subprograms near the top of the program hierarchy are often 
highly particular. They occur only once. Only as you work downwards do you 
encounter functions that are common to different parts of the program. 

The word ‘function’ is used here vaguely (in contrast with the Pascal term 
FUNCTION which has a precise meaning) because the programmer has considerable 
freedom in breaking a task into subtasks, which may be coded into FUNCTIONS or 
PROCEDURES as convenient. 

Another essential feature of modular programs is the isolation of subprocesses. 
Each subprogram is insulated from the rest of the program except for a few well- 
defined connections. Typically only its parameters are used to pass information 
between it and its environment. Only with such. isolation is it possible to be 
confident that making alterations in one part of the system will not have unforeseen 
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(and probably undesirable) consequences in other parts. The links between a 
subprogram and the rest of the program should be clearly set down and all 
communication directed through the proper channels. 

The temptation to pass information ‘under the counter’ by means of global 
variables and flags should be strenously resisted. There is nothing more demoralizing 
than having your painstakingly constructed edifice of software collapse like a house 
of cards when one minor change has unexpected and far-reaching repercussions. 

A modular program is like a model made from Lego bricks: each building-block 
interlocks in a standard way with any other. A program that is just put together ad 
hoc is like a dry stone wall made from irregularly shaped chunks of rock fitted 
together. You can admire the skill and dedication of the builder without wanting to 
emulate his feat. 


12.1.3. Data representation 


So far only the organization of processing has been considered. Organizing the data, 
i.e. choosing representations for the information of the task, is perhaps even more 
crucial. There is one school of thought which believes that the program structure 
should in some sense mirror the form of the data, and that a mismatch between 
processing and data structure will result in a program that is grossly inefficient, if 
indeed it works at all. Yet many programmers regard data representation almost as 
an optional extra — to be dealt with, if ever, as an afterthought when the ‘real’ 
business of coding is finished. The presence of predefined data types (especially 
numbers) together with the hardware and software to deal with them tends to mask 
the fact that information on the real world has to be mapped onto the underlying 
binary storage of the computer. Programming is not exclusively about numbers: it 
is about symbols which stand for things or events, and which symbolize their 
relevant features well or badly. 

Concern with representation leads to a style of program design known as the 
‘bottom-up’ approach. Given a representation, various processes must be performed 
on the information represented — getting data in and putting them out, at the very 
least. The programmer attends to the implementation of these basic processes before 
going on to consider how the entire program should be structured. 

For example, in an application needing numbers represented to 100 decimal 
places, the programmer would be well advised to decide on a compact and 
reasonably efficient data representation, write the routines for input, output 
and arithmetic on such numbers, and test these before turning to the original 
application at all. In this way a library of subprograms could be developed that 
might prove useful in a variety of different programs. 

There are advantages in working from the bottom of a program towards the top. 
One of these is the separation of representation details on the one hand from the 
use of the stored information on the other. For example, having created a complete 
set of subprograms, it is often possible at some later date to alter (i.e. improve) 
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the underlying representation without modifying any part of the program except 
the low-level data-handling routines. 

Occasionally the bottom-up approach provides important information about the 
qualities of a proposed representation. If data structures are devised when needed 
only a few of the operations on those structures are uppermost in the programmer’s 
mind at the time. Although the ad hoc representation may be extremely good for 
these few operations, it may be found inadequate or exceedingly wasteful when the 
full range of operations is required. Coding the whole collection at the outset can 
expose such deficiencies before a large amount of work based on an unsuitable data 
format has been wasted. 

The danger with the bottom-up approach is that the programmer spends so 
much time digging the foundations that the house never gets built at all. One might 
start wanting to draw simple outline shapes made of straight lines on a plotter or 
screen, decide on a data representation and end up with a vast and impressive 
graphics package that handles 3D objects like spheres, that plots all sorts of curves, 
that simulates perspective — and that really is not needed to do the job in hand. 

An additional issue often arises with representations that permit insertions and 
deletions, such as lists and list structures. After a data structure is no longer needed 
it ought to be erased and the space it occupied returned to free storage. This applies 
to the dynamic structures of Pascal which are built with the NEW function and 
linked by pointers. 

If the data representation has been well thought out, there will be little difficulty 
in coding the erasing process; but there may still be a problem in deciding which 
subprogram should do the erasing. This is the problem of responsibility: the module 
that is responsible for a structure knows when it is no longer needed. In simple 
situations where a structure is created, used and discarded within a single sub-routine, 
the issue of responsibility is easily settled. However, if structures are created by 
routines that have no control over what other routines will do with them and no 
indication of what would render them superfluous then some more elaborate set of 
responsibility conventions becomes necessary. The principle that functions should 
be centralized whenever possible suggests that responsibility should rest in one 
routine (the so-called “garbage collector’) that has independent access to all 
dynamically growing structures and some way of determining when they have lost 
their usefulness. In some languages, Lisp for instance, a garbage collector is part of 
the system and operates automatically, freeing the user from this housekeeping 
chore. Unfortunately this is not true of Pascal. In Pascal the DISPOSE routine often 
does not perform selective deletion, and the programmer may be forced to maintain 
a list (or even several lists) of available store, to which discarded items are added. 


12.1.4 The art of computer programming 


We have seen that both top-down (process oriented) and bottom-up (data oriented) 
approaches have certain advantages. But there are also proponents of what might be 
termed the ‘middle-out’ method who seize on the nub and crux of the problem (the 
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part that one intuitively recognizes as difficult) and, having solved that, work 
outwards from the heart to the fringes. All three methods share an emphasis on 
doing things methodically, on working steadily from a partial to a complete 
solution, but the third relies more on intuition, and is therefore less safe, especially 
for beginners. Whichever you use, you will still need to think. Programming is an 
art which requires a delicate mental balancing act: while considering data 
representation you must not entirely forget the processing that is to be carried out 
on that data, and while organizing the processing you cannot afford to neglect the 
data representation. Structured programming techniques are meant to help the 
programmer; but they do not remove the need to use your head. If they could, 
programming would be no fun any more. It would all be done automatically. (This 
may happen one day.) 

One of the things you need to use your head for is considering the growth (and 
perhaps decay) of the system over time, an aspect of serious programming that 
ordinarily receives scant attention. 

Many novices fall down by trying to get a complete system, with all “bells and 
whistles’, working at once. Far more useful than a detailed overall flowchart or 
program schematic is a plan of campaign — a timetable for what can be implemented 
when. This entails that the program be written with some dummy modules which 
do nothing to begin with but which will later be expanded to accomplish functions 
that are not immediately essential. Getting an initial, minimal version working fairly 
soon gives you encouragement and allows you to see the consequences of your 
design while there is still time to revise it. 

Even when you think the program is complete, however, it probably will not be. 
A large program or system represents a considerable investment. Frequently it 
undergoes modifications which seek to preserve as much of the system as possible 
while extending it to run in more general conditions. To prepare for this, you must 
keep your program flexible. Flexibility is always worth more than you are prepared 
to admit at the moment of coding, if only because external pressures such as 
completion deadlines or limitations on storage space seem more urgent at the time. 

One thing that greatly reduces program flexibility is lack of documentation. 
Ideally each module should be documented to indicate what it needs in the way of 
input and what it produces by way of output, as well as stating succinctly what it 
does. Without proper documentation it becomes progressively more difficult to 
introduce changes into a large, well-used program without losing control over their 
indirect effects, and without a gradual loss of understanding that eventually makes 
the program worthless — the senescence of software. 

A program in its old age is a pitiful spectacle — abandoned long ago by those 
who cared about its welfare and so full of scars, stitches and emergency transplants 
that all trace of its once-handsome form is obliterated. No one understands any 
longer how it works or, more important, why it goes wrong. It becomes a 
chronically sick geriatric patient, languishing in the terminal care ward, puzzling its 
doctors and annoying the nurses by its erratic behaviour and repeated demands for 
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attention. One day the time comes when it is less trouble to put it out of its misery 
than to attempt the drastic surgery necessary to restore it to its former state of 
health. 


12.2 Debugging 


OK. So you say you followed all that advice, and your program still does not work. 
Excuse me a moment while I gnash my teeth. (Grrr... .) 

In a way you are lucky: at least you know there is something wrong with your 
program. There are plenty of programs around whose users are blithely unaware 
that they are getting incorrect results. 

We can start from the assumption that you have found some sort of mistake or 
malfunction — in other words a ‘bug’. It is customary to divide program bugs into 
three categories: (1) compilation errors; (2) execution errors; and (3) logical errors. 
Each kind is more serious than the one before. We will deal with them in turn. 


12.2.1 Compilation errors 


These arise when the program violates the rules of Pascal. For that reason they are 
also called ‘syntax errors’. When an attempt is made to compile a program containing 
syntax errors the compiler rejects it and informs the user which statements are 
incorrect. The program cannot be executed. 

Normally these are the easiest to deal with. In many cases they arise from simple 
misspellings, typing mistakes, missing commas or semicolons and so forth. As you 
gain (bitter) experience you will learn the ways in which you are most likely to slip 
up, the sort of error messages they will cause the compiler to give, and the remedies 
to apply. Many old favourites result from neglecting the principle of balance. Thus 
every BEGIN must have its END; every left parethensis needs a right parenthesis; 
every opening quotation mark needs a closing quotation mark, and so on. Also 
very popular are misplaced semicolons: the rules for their use are stated in Section 
6.5. Another common mistake is forgetting to declare all the variables used in a 
program or subprogram. 

One of the most puzzling I ever came across involved the transposition of a 
multiplication sign (asterisk) and a bracket. Instead of something like 


... sqrt(b*2)*((n+1)/2)... 
I had typed 
... sgrt(b*2)(*(n+1)/2)... 


in the middle of a rather long arithmetic expression. The compiler took ‘(#’ as the 
start of a comment. Since this ‘comment’ was never terminated, it gave the 
misleading error message 


no end. of program 
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which led to many hours of frustration, as the final ‘END.’ was quite obviously 
present, and this diverted attention away from the true source of the error. 

Occasionally syntax errors are symptoms of a deeper problem. What typically 
happens in such cases is that when the syntax errors are cleared up, the program 
still fails to run. 


12.2.2 Execution errors 


Anyone who has taught programming has heard the plaintive cry ‘my program is 
correct, but it won’t run’ many times. This is a fallacy. The program is not correct: 
it is very important to realize that a program which has been compiled successfully 
may still contain errors. 

Execution errors occur when a program attempts something illogical or 
impossible, even though it is syntactically valid. Typical examples are division by 
zero and trying to use a subscript outside the bounds set for the array. These errors 
are not detected until the program is run, whereupon execution is halted and an 
error message printed out. (If you are really unlucky your program may just ‘blow 
up’ or run amok and ‘crash’ the whole system!) 

Pascal systems vary in how much information they give about the place where 
the execution error occurred. For instance, the bare message 


STACK OVERRUNS HEAP 
is far less useful than 


STACK OVERRUNS HEAP 
IN PROCEDURE MAKENODE 
CALLED FROM PROCEDURE MAKELIST 


and it would be still more helpful if it pinpointed the actual line where the trouble 


arose. 
But even after discovering exactly where the program halted you may need to 


track down the source of the error elsewhere. Quite possibly the statement that 
failed is correct, but because another part of the program did not do its job properly 
it cannot carry out what it is meant to do. 

As a general rule, if the error is complex enough to puzzle you it will also puzzle 
the computer, so you must be prepared for error diagnostics that lead you astray if 
taken at face value. 


12.2.3 Logical errors 


Logical errors do not cause a program to fail, but cause it to produce the wrong 
answers. Generally speaking they are symptomatic of a poor design. Unfortunately 
there is nothing to guarantee that a program consisting entirely of valid Pascal 
statements which runs successfully to completion is actually producing the right 
results. So after assiduously removing all the syntax errors and making the changes 
necessary to let it run you may be left with a program whose output is waste paper. 
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What do you do then (apart from weeping)? 

The first thing is to take a deep breath and wrench yourself away from the 
terminal. You will have an almost overpowering impulse to dive into the program, 
hacking away a few lines here, inserting a chunk of code there, and confusing the 
situation still further. It is time for thinking, not editing. So get a listing of the 
program, a printout of its results and a copy of the test data used as input and retire 
to a quiet corner to ponder your next move. (Test data? You never heard of that? 
Well, it is about time you did.) Only return to the computer when you have a 
hypothesis that could explain the program’s misbehaviour and know how you are 
going to test it. 


12.2.4 A strategy for debugging 


Let us take a step back, and approach the problem methodically. There are three 
stages in getting rid of a logical error: firstly detecting its existence; secondly 
finding where it is; and thirdly putting it right. None of these need be very simple. 

The first stage involves thorough testing. Devising really stringent test cases (and 
checking that the output they give rise to is correct) is hard work. Very often it is 
skimped. Surprisingly many programs appear on the market from reputable suppliers 
simply riddled with bugs that escaped quality control. The customers are then left 
to do the field testing. 

Since the authors of programs tend to be blind to their faults, a good mile of 
thumb is simply to show your work to a colleague or friend, or better still to let 
someone else use it for a while. You will be amazed how protective you have been 
towards your brainchild — never subjecting it to invalid data, never giving it too 
many or too few items of information, making all sorts of allowances (even without 
realizing it) for its shortcomings and, in short, only testing it on the highly restricted 
class of inputs you already knew it could deal with. A little rough handling will do 
it the world of good. Schoolchildren are particularly adept at testing software ‘to 
destruction’, so if you can persuade a 12-year-old to have a bash at your program its 
Achilles heel will soon become glaringly apparent — though informal trials of this 
kind (by ordeal) are still no substitute for really systematic testing of modules as 
they are created. 

Having found something amiss, you still need to know what part of the program 
is responsible. The art of the debugging lies in forcing the program to reveal its 
behaviour patterns — which variables are being changed, which are not being 
changed, which paths are being selected, which are not being selected, and so on. 
To do this you will either have to insert additional WRITE statements to display 
critical values at key points or employ a debugging aid (if you have one) to do this 
for you. The tricky bit is to decide where to start looking, for you should always 
fight shy of blanket-coverage techniques which generate reams of unselective 
printout, such as the ‘core dump’. 

Here, as elsewhere, ‘divide and conquer’ is a good rule. If a program or module is 
misbehaving it must be going wrong in the first half or the second half. (It could be 
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wrong all through, but we will concentrate on simple cases first.) So you make it 
display the intermediate results at about half-way. ‘Aha!’ you exclaim: ‘it got this 
far and things were still all right’ or ‘oh dear! it has already gone wrong’. In either 
case you can now repeat the process on the top or bottom half, according to the 
method of bisection (see Chapter 9), halving the area of uncertainty at each stage. 
Finally, when you have narrowed it down to a single statement, the error (which 
you have looked at many times but somehow never noticed) leaps out and bites 
you on the nose like a cornered rat. (Incidentally, do not feel that time spent not 
finding errors is wasted: you have made some progress when, as the police say, a 
suspect has been ‘eliminated from enquiries’.) 

Finding an error is one thing; getting rid of it is quite another matter. Broadly 
speaking, you will experience one of two contrasting emotions when, after much 
hunting, you finally trace an error to its root cause — elation because you see at 
once how to correct it or a sinking sensation of hopelessness as you realize a gross 
oversight in your original design. Let us consider the more problematical alternative, 
when there is no straightforward ‘fix’ (presumably because, by fixing this one error, 
other inadequacies will be exposed). What this means is that your program does not 
really solve the problem it was intended to solve. This is very difficult to admit, 
and we are all prone to spend much effort delaying the admission as long as possible, 
even if it means an interminable series of ‘fixes’, each one generating the need for 
another. Yet if your program has gone wrong you must be prepared to go back to 
the drawing board and reexamine your original assumptions. There is no point in 
throwing good money after bad. The most reasonable attitude is to salvage as much 
as can be salvaged (those routines that do work) and throw away the rest. The 
second writing will be easier than the first because your understanding of the task 
has increased. Some programs are ill-fated and just have to be scrapped. It is worth 
remembering in this context that computers do exactly what you instruct them to 
do, however stupid. 

Finally, a word on back-up and recovery: even a perfect system cannot work if 
someone soaks the master diskette in coffee or wipes the data tape clean by 
pressing the wrong button on the recorder. Therefore your programs must dump 
security copies of important data from time to time, and be able to catch up to the 
state they had reached before the blunder occurred, without too much fuss. As a 
rough estimate, a really robust piece of software will devote 80% of its work to 
protecting users from the consequences of their own mistakes. 

Two good sources of further reading on these subjects are Software Design for 
Microcomputers and A Guide to Good Programming Practice (Ogdin, 1978; Meek 
and Heath, 1980). 


12.3 Design faults in Pascal 


I would not have written this book if I did not believe that Pascal is a very well 
designed language, especially suited to the kind of structured programming 
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techniques recommended in this chapter; and you probably would not have read 
this far if you did not think likewise. Nevertheless it would be foolish to leave you 
with the impression that it is the greatest thing since sliced bread (or wholemeal 
loaves if you are into health foods). A brief consideration of its imperfections should 
broaden your understanding of the language. 


12.3.1 The commitment to fixed length 


To make Pascal programs easy to compile all arrays must have a fixed predetermined 
size, even when passed as parameters. Indeed the only variable-length data structure 
in Pascal is the sequential file. 

This rigidity does not lead to great benefit in terms of efficiency. Many languages 
implement ‘dynamic’ arrays, whose size is not fixed at compilation time. The lack 
of dynamic arrays is particularly irksome in connection with subprogram libraries. 

It is natural to write a procedure that can, for example, invert a matrix of any size. 
The logic does not alter if the matrix is 10 x 10 or 20 x 20 or whatever. In some 
languages (e.g. Fortran) the size of the matrix to be processed would be passed as 

an extra parameter to the procedure, but in Pascal a 10 x 10 matrix is of a different 
data type from a 20 x 20 matrix; and a parameter can only have a single type. Hence 
one procedure can only deal with one size. If you want to invert several matrices of 
different sizes in one program, you are stuck. There are ways of doing it, but they 
all entail a sacrifice of clarity (and efficiency too). 

I think most Pascal programmers would agree that flexible arrays rank very high 
on the list of desirable new features in Pascal. They could be introduced without 
blowing a hole through Pascal’s type-checking mechanisms. 


12.3.2 Direct access to files 


Pascal does not have direct or random access to files. In commercial terms, this is 
the killer. Modern data processing would grind to a halt if this facility was 
prohibited. Accordingly several implementations, including the original CDC 6000 
version supervised by Wirth himself, provide some means of direct access to records 
in files via predefined procedures and functions. However these are non-standard, 
promoting incompatibility between machines; and one of Pascal’s strengths is the 
portability of Pascal programs. 

It would be better if a recommended set of direct-access routines was agreed 
upon and promulgated, and better still if the concept was brought into the language 
— either by allowing ‘virtual’ arrays, in which some elements are actually held on 
backing store, or by having files in which any individual record could be selected 
by its position. 


12.3.3 Strings 


A string is a sequence of characters. The distinctive feature of a string is that its 
length can vary, so here again we run into Pascal’s prohibition on variable-length 
data. Officially the Pascal programmer has to resort to packed arrays of CHAR, 


176 PASCAL AT WORK AND PLAY 


which are too rigid and waste space since they must be declared big enough for the 
longest anticipated string, or to files of CHAR, which are too slow. Yet many more 
primitive languages (Basic among them) provide variable-length strings and the 
operators to manipulate them. Some Pascal implementations do so too (notably 
UCSD Pascal) but at the cost of reducing program portability (see above). There is a 
need for a string datatype and an acknowledged standard set of string-handling 
routines. 


12.3.4 Garbage collection 


Finally there is the problem of gargabe collection, alluded to in Section 12.1.3. It is 
easy enough to build linked structures but eventually free storage is exhausted, and 
then it is difficult to release space occupied by unwanted data. The DISPOSE 
procedure is far from satisfactory for most applications, so the net result is that the 
programmer using dynamic storage allocation has to give so much thought to the 
management of free store that the advantages of dynamic structuring are lost. It 
may well be simpler to use static allocation instead. 

As the man said: if they won’t collect the garbage, why pay the rates? 


12.4 Projects 


I am sure most of you have better things to do with your time than write programs 
you do not even need, but for those readers who do not feel they have got their 
money’s worth without a really tough assignment, here are some suggestions for 
whiling away the hours. They are ‘real’ programming problems, chunky enough to 
make demands on the design and debugging skills we have been discussing in this 
chapter. 


1. Look back at Section 11.1 and investigate how SIMP could be made part of a 
subprogram library on your system, then do so and test it on at least two different 
functions. 


2. Take the estate-agent example from Section 11.2 and turn it into a workable 
system. A good idea, if you can pull it off, would be to visit your local estate agents 
and get commissioned to computerize their records. They buy a microcomputer 
with Pascal and pay you to implement the system on it. (Well, you have to try.) 
You will certainly find out that way which facilities are needed and which are 
options. 


3. If you have not already done so, improve the CHOPPER program shown at the 
end of Chapter 11 in the ways suggested by adding additional simplifications, by 
re-using unwanted store and by recognizing the priority of operators so that 
complete bracketting is not required for input. 

That is really just a launching pad for a program that will be of genuine use in 
the design and construction of digital circuits. Assuming you are still keen, the 
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second stage is to accept a system of more than one boolean equation, simplify it, 
and display the output not only as logical expressions but also in graphic form, 
e.g. as a circuit diagram on a CRT screen. That should keep you busy for a while; 
it might also earn you some money. 


4. Write a program to do symbolic differentiation. This will draw on many of the 
techniques of exercise 3 (upgraded CHOPPER). Expressions can be held as trees. 
The rules for differentiating with respect to X, where A is any constant and El, E2 
are expressions, are as follows. 


diff(a)=0; diff(x) =1 

diff(e1 + e2) = diff(e1) + diff(e2) 

diff(e1 — e2) = diff(e1) — diff(e2) 

diff(e1 * e2) =diff(e1) *e2 +e1 * diff(e2) 

diff(e1 / e2) = (diff(e1) *e2 —e1 * diff(e2)) /e242 

diff(e1 * e2) = (e1 “ e2) * (diff(e2)*In(e1) + e2#diff(e1)/e1) 
A useful special case of the last one is the simple power rule. 

diff(x * a) = (a*x “4 (a—1)) 
You can look up the more complex rules in any textbook on calculus, though this 
should be quite enough to be getting on with. 

It will be seen that the DIFF procedure will be heavily recursive. What may not 


be so obvious is that the expression produced as derivative will have many 
redundancies — such as 


((x1)+0) + (1%(14+1) + (x#*(x*0))) 
where 


Xx *2 


is what we want. 
To resolve this a TIDY procedure, similar in principle to that in CHOPPER, will 
be needed. Its rules will include the following. 


O+x=xt+0O=x 


x—-O=x 
0O—-x=-x 
0O*xx=x*0=0 
1*x=x*1=x 
Xx+x=2 *xX 
x—x=0 
x*x=xAQ 

x \Q=1 
xA1=x 


It should also work out N + M, N— M, N *M and N 4 M where N and M are 
constants. 
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5. A search program that looked up words and retrieved associated information was 
presented in Chapter 9; but if the words are slightly misspelt it will usually fail to 
find them. Consider the situation where the words are people’s names and we need 
to retrieve information about those individuals. Live systems cannot always rely on 
perfect spelling: the data recorded on Richard S. Forsyth should be accessible when 
S.R. Forsyth or R. Forsyte or R.S. Forsythe or maybe even Richard Forster is 
presented. This is the problem of partial matching. Try to devise a comparison 
scheme which, if no perfect match is found, will list the next 3 or 4 most promising 
candidates so that the operator can decide by inspection whether to proceed. 

You may work on penalty points to arrive at a difference score: a difference in 
the initial letter of the surname gets bad marks, but the final letter is less serious; 
some letters (e.g. e and i) are more interchangeable than others (e.g. e and x); 
ordering is more important than absolute position, and so on. 

One formulation of this matching task is to take one name as given and the other 
as goal and see how many elementary transformations — removing a letter, inserting 
a letter, changing a letter, and so on — are needed to get from the given to the goal. 
To save time the process can be abandoned after a certain number of steps: too 
many changes indicate a very poor match. 

Another approach is to regard both names as sets of characters and to carry out 
two subprocesses. First add all characters in the goal set not present in the given 
set and delete all characters from the given set not found in the goal, scoring 
penalties as appropriate. Second consider the ordering by summing the distance 
each character has to be moved to reach its true position. 

This is a real-life problem, so there is no perfect solution. You can have fun 
tuning your similarity algorithm so that it gives good results. A dip into a telephone 
directory will provide all the raw data you need, and more. 


6. Write a program that does something useful, and works. 


7. Repeat the previous exercise till you grow tired. 


PART TWO 


Pascal at Work 


‘Careful study and imitation of good programs leads to better 
writing.’ 


Kernighan and Plauger, Software Tools 


13 
Case study 1 


(Sorting) 


First of all, to all those of you who have read this far — a word of congratulation. 
If, in addition, you worked your way through the examples and exercises, you 
deserve a medal. (Who are you trying to kid?) 

If, on the other hand, you took the short cut direct from Chapter 1 to see how 
it was going to turn out. . . well, now you are here, you might as well stick around; 
you could learn something. 


13.1 Sorting files 


Sorting is the re-arrangement of data into ascending or descending order. Computers 
spend a great deal of their time sorting, so the study of efficient sorting methods 
has received much attention. Sorting is usually carried out to make retrieval of 
information easier, either for man or machine. The reason for this is obvious if you 
consider the effort needed to find a telephone number in a directory where 
subscribers are listed in a totally haphazard order. 

Two kinds of computer sorting may be distinguished — internal and external 
methods. Internal sorting algorithms can be used if the amount of data is small 
enough to fit into the computer’s main memory. Internal sorting is, in general, 
faster than external sorting because access to main memory is faster than to discs, 
tapes, etc. It should be used where applicable. Two internal methods have already 
been presented, Selection Sort (Chapter 8) and Shell Sort (Chapter 9) and two 
better methods will be shown at the end of this chapter, namely Quicksort and 
Heapsort. (Please, those who have been exposed to it, never use the Bubble Sort 
again: it is just about the slowest method ever devised.) 

We will, however, concentrate on external sorting — where there is too much 
data to fit into main memory — because it is in many ways a more challenging 
problem. In scientific work a few thousand integers constitute a big data set, but 
in commercial data processing hundreds of thousands or even millions of records, 
each containing several hundred characters, may need to be sorted. This is the 
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kind of workload that the clearing banks have to deal with every night; and clearly 
internal sorting is not the answer. 

The strategy of most external sorting methods is to split the main sequence 
into subsequences which are ordered by an internal method and then joined 
together by a merging procedure. The greater the size of available main storage 
the longer the subsequences can be, and hence the fewer of them. This in turn 
means that the sort will progress faster, since access time to main store is much 
shorter than to backing store. 

Nevertheless it is possible to sort a file without using any internal sort procedure, 
and our case study will employ pure merging because (1) it shows how large files 
can be sorted with very small memory requirements; (2) it demonstrates the 
principle of merging, at the heart of all external sorts, without distractions. 

Two sorted sets can be merged into one longer set, still sorted, by taking items 
pairwise from the input sets and sending the lower one (or the higher one, for 
descending order) to the output set. The item transferred is then replaced by the 
next item of the set from which it came. This is repeated till one set is exhausted, 
when the remainder of the other set (if any) can be transferred to the output set. 

This principle can be used as the basis for sorting by splitting the input set into 
ordered subsequences which are then merged together. (Note that a subsequence 
of length 1 is trivially ordered.) By repeated merging the subsequences grow longer 
and longer until the whole set is in order. Merging is typically applied to external 
sorting with files, but it may also be used internally with arrays. 

We have outlined two-way merging, where two input sets are merged onto one 
output set. However the same logic may be extended to three or more input sets, 
comparing three or more items and selecting the minimum (or maximum) for 
output. Three-way and four-way merges are sometimes used in practice. 

It is worth pausing to ask yourself: why is merging especially suited to external 
sorting? The answer should be apparent: only two (or at most a few) records need 
be present in main memory at one time, and the input and output sets can be held 
naturally on a medium that allows only serial access, e.g. magnetic tape. 

The classic four-tape sort is the basic method of sorting serial files. It is chosen 
here in preference to the slightly more efficient, but vastly more complicated, 
Polyphase sort. 


13.2 The classic four-tape sort 


There are two main phases in this process — first distribution, then merging. 


13.2.1 Distribution phase 


In the distribution phase the input items are read one by one and placed onto 

two of the four ‘tapes’ (sequential files in this case) so that any ordering that 
already exists is preserved. This point is important: it is foolish to write a program 
that does not take advantage of work already done for it. 
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The data is thus arranged into one or more subsequences on each of two files. 
For example, using positive integers for illustration, the input 


97 20 53 04 32 15 50 89 48 95 04 37 
would be divided into six ascending subsequences 


tl 97, 04 32, 48 95 
t2 20 53, 15 50 89, 04 37 


placed alternately on two files. 

We have marked sequence breaks with commas, but this is just for the reader’s 
convenience. The question is: how does the program tell when a sequence ends? 
The answer, since our desired order is non-descending, is that a subsequence ends 
when an item smaller than its predecessor is read (or when the end of file is 
encountered). This definition even allows fortuitous merging of successive 
subsequences as on tl above where 48 fits snugly after 32 though they belonged 
to separate sequences. (Did you think it was an accident?) Again, this can save 
work, so our method should take advantage of it. 

These ideas are incorporated in procedure DISTRIBU in the program of 
Section 13.3. 


13.2.2 Merging phase 


Having distributed the subsequences, the program must then repeatedly merge 
two shorter sequences into one longer sequence first from t1 and t2 onto t3 and 
t4 alternately and then back from t3 and t4 onto t1 and t2 alternately until only 
one long sequence is left — the sorted data. Continuing with the example numbers 
above, this process would go through the major steps outlined below. Do not read 
further until you have grasped the idea. 


t1 97, 04 32 48 95 
t2 20 53, 15 50 89, 04 37 


t3. 20 53 97, 04 37 
t4 04 15 32 48 50 89 95 


ti 04 15 20 32 48 50 53 89 95 97 
t2 04 37 


t3 04 04 15 20 32 37 48 50 53 89 95 97 
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It is the job of procedure MERGESUB in the program to perform a single 2-on-1 
merge: the main program calls MERGESUB repeatedly, keeping track of which pair 
of input files to use as source and which output file to use as destination. When 
both input files are empty, sources and destinations change places, until the merging 
is complete. 

MERGESUB is quite simple because most of the testing for end-of-file and 
end-of-sequence is performed in the lower level procedure NEXT which delivers, if 
it can, the next record of an input file. If however the end of file has been reached 
it supplies instead a dummy record HUGE that must be greater than any genuine 
record and sets a Boolean parameter FEND to indicate end of file. If the next 
record would be smaller than the current one (end of sequence) it does not read 
on but supplies HUGE. MERGESUB will compare HUGE with the genuine record 
(from the other file) and always pass the genuine one to the output file as it is 
smaller — ensuring that an unfinished input sequence is flushed out when the 
other one finishes, and catering for empty sequences in the process. (You may be 
interested to compare this with the merging technique presented at the end of 
Chapter 9.) 

The function LESS performs a character-by-character comparison of two LINES 
(packed arrays of CHAR) from a specified starting point to a specified end position. 
Since DEC-10 Pascal pads out a packed array of characters with spaces if it is too 
short when read, all lines are treated as the same length. The user can thus specify 
a segment anywhere in the line as a ‘key field’ for sorting. For instance the lines 
could be ordered on the basis of characters 11 to 20 inclusive. 

Notice that we are sorting TEXT files. Most textbooks deal with numbers in 
their expositions of sorting; but most real-life applications use character sequences. 
Also, this way, I get a useful program out of it, one that helps prepare an index for 
this book. 

The program follows, accompanied by a sample run. The input file used and the 
output file produced are also shown. 

(Procedures INIT, MONITOR and PROFILE are not involved in the sorting. 
They are concerned with monitoring the program; see Section 13.4.) 


13.3. A program of sorts 
13.3.1 Program TAPESORT 


program tapesort(inpfile,outfile); 
(% the classic 4-tape sort *) 
(* by Richard Forsyth, PNL, June 1981 +) 
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label 99; (%* emergency exit *) 


const linesize = 55; 
tiny =’ r 
(% low valued string of linesize characters *) 
huge =‘JJIJIJIJ ITV) 0 VIII: 
(%* high valued string constant *) 
subs = 8; (no. of subprograms for frequency profile *) 


type 


line = packed array [1 .. linesize] of char; 
(* a line is a fixed-length string *) 
linefile = file of line; 


var 
flip,flop : boolean; 
c1,c2,p1,p2,pass,step : integer; 
inpfile,outfile : text; 
t1,t2,t3,t4 : linefile; (%* the four ‘tapes’ *) 
procfreq : array [1 .. subs] of integer; 
proname : array [1 .. subs] of alfa; 


procedure monitor (p : integer); 
(% increments usage count for subprogram p *) 


begin 

procfreq[p] := procfreq[p] + 1; 

procfreq[1] := procfreq[1] +1; (self also *) 
end; (of monitor *) 


procedure init; 
(% initialization, mainly for frequency statistics +) 
var i : integer; 


begin 
for i:=1tosubs do procfreq[i] :=0; 
procname[1] := ‘monitor  '; 
procname[2] := ‘init ; 
procname[3] := ‘less ; 
procname[4] := ‘distribu ’; 


procname[5] :=’copyover '; 


procname[6] := ‘next . 
procname[7] :=‘mergesub ’; 
procname[8] := ‘profile + 


monitor(2): 
end: (of init *) 
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function less (var a,b : line ) : boolean; 
(%* tests whether line a is less than line b *) 
var p : integer; 
diff : boolean; 


begin 
p:=p1l—1; (%#p1 is global start position *) 
repeat 
p:=pt; 


diff :=al[p] <>bl[p]; 
until diff or (p >= p2); 
(* p2 is global end position *) 
less :=al[p] <b[p]; (strictly less than +) 
monitor(3); 
end; 


procedure distribu (var from : text; var f1,f2 : linefile): 
(% places sequences alternately on f1 and f2, 
preserving any ordering that already exists *) 
var last, this : line; 
recs,seqs : integer; 
pingpong : boolean; 


begin 
recs :=Q; seqs :=0; 
reset(from); 
rewrite(f1); rewrite(f2); 
last := tiny; 


pingpong := true; 


while not eof (from) do 
begin 
read(from,this); (all right on DEC-10 +) 
if eoln(from) then readin(from):; 
recs := recs + 1; 
if less(this,last) then (%# break in sequence *) 
begin 
seqs := seqs + 1; 
pingpong := not pingpong; 
end; 
(%* switch output files *) 
if pingpong then 
begin 
f14 :=this; put(f1); 
end 
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else 
begin 
f24 :=this; put(f2); 
end; 
last := this; (* latest record +) 
end; (of distribution loop *) 


writeln(‘input file of’ ,recs:8,‘lines split into’,seqs:7,’ sequences.’); 
monitor(4); 
end; (of distribu +) 


procedure copyover (var from: linefile; var into : text); 
(% copies sorted result file in readable form +) 


begin 
reset(from); 
rewrite(into); 
while not eof (from) do 
begin 
writeln(into,from 4); 
get(from); 
end; 
monitor (5); 
end; (of copyover *) 


procedure next (var f : linefile; var frec: line; var fend : boolean); 
(% gets next line into frec unless end of file or sequence *) 


begin 
if eof(f) then 
begin 
fend :=true; frec := huge; 
end; 
( huge is a device to force unfinished file to be flushed out *) 


if not fend then 
if less (f*,frec) then (end of sequence *) 
begin 
fend :=true; frec := huge; 
end 
else 
begin 
frec := fA; get(f); (next item *) 
end; 
monitor (6); 
end; (of next *) 
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procedure mergesub (var a,b,c : linefile); 
(%* merges a sequence from a and 1 from b onto c *) 
var afin, bfin : boolean; 
arec,brec : line; 


begin 
afin := false; bfin := false; 
arec := tiny; brec:=tiny; (* global low value +) 
next(a,arec,afin); 
next(b,brec,bfin); 


while not (afin and bfin) do 
if less(arec,brec) then 
begin (take from a *) 
cA :=arec; put(c); 
next(a,arec,afin); 
end 
else 
begin (take from b *) 
c4 :=brec; put(c); 
next(b,brec,bfin); 
end; 
monitor(7); 
end: (of mergesub *) 


procedure profile; 
(* prints out frequency statistics *) 
var i,t : integer; 
pc : real; 
begin 
monitor(8); 
writeln: writeln(‘subprogram frequency statistics:’); 
writeln; 
t := procfreg[1]; 
for i := 2 to subs do 
begin 
pc := 100 * procfreg[i] /t; 
writein(i:4,procname([i] :12,procfreg[i] :7,pc:10:4); 
end; 
writeln; 
end: (of profile *) 
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begin (%* main line *) 
init; 
writeln; 
writeln(‘Classic 4-tape sort program.’); 
writeln; 
write(‘position of first character in key field? ’); read(p1); 
write(‘position of last character in key field? ’); read(p2); 
if p2 > linesize then p2 := linesize; 
if (p1 <1) or (p1 > p2) then 
begin (error in key field specification +) 
writeln(‘impossible character position!’); 
writeln(‘1 to ‘,linesize:4,’ are the limits.’); 
goto 99; (* quit *) 
end; 
writeln; 


c1:=clock; (start cpu timer +) 
distribu (inpfile,t1,t2); 


flip := false; 
pass := QO; 
writeln(‘sort in progress ....‘); 
writeln; 
repeat 
flip := not flip; 
step := 0; 


if flip then (#+t1,t2 are sources, t3,t4 destinations +) 
begin 
reset(t1); reset(t2); 
rewrite(t3); rewrite(t4); 
flop := false; 
while not (eof(t1) and eof(t2)) do 
begin 
flop := not flop; 
if flop then mergesub(t1,t2,t3) 
else mergesub(t1 ,t2,t4); 
step := step + 1; 
end; 
end 
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else (%t3,t4 are sources; t1,t2 destinations *) 
begin 
reset(t3); reset(t4); 
rewrite(t1); rewrite(t2); 
flop := false; 
while not (eof(t3) and eof(t4)) do 
begin 
flop := not flop; 
if flop then mergesub(t3,t4,t1) 
else mergesub(t3,t4,t2); 
step := step + 1; 
end; 
end; 
(%* would be simpler if Pascal allowed file assignment +) 


pass := pass + 1; 
writeln(‘phase’ pass:4,’ done in’,step:4,’ steps.’); 
until step <= 1; (% only one sequence left +) 


(% now produce the output *) 
if flip then 
if flop then copyover(t3,outfile) 
else copyover(t4,outfile) 
else 
if flop then copyover(t1,outfile) 
else copyover(t2,outfile); 
c2 := clock; 
writeln; 
writeln(‘sorting complete after’ ,(c2-c1)/1000:8:4,’ seconds runtime.’); 


99: 
profile; (%* subprogram usage profile +) 
end. 


13.3.2 Input file INPFILE 


birthday prog 2.1 calculates week day from date 
inflater prog 5.3 demonstrates effect of inflation 
pivalue prog 6.3. calculates approximation to pi 
average part 6.4 finds mean and maximum 

decline prog 6.7. tabulates asset depreciation 

easy prog 7.1. prints table of square roots and logs 
veryeasy prog 7.1 prints table of square roots and logs 
yardmile proc 7.2 conversion from yards to miles 
speedmph_ func 7.4. speed in mph from metres and seconds 
highfact func 7.5 highest common factor (recursive) 
chances prog 7.6 binomial probability calculations 
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counter1l prog 8.2 frequency count for e 

counter2 prog 8.2 frequency count for vowels 
counter3 prog 8.2 frequency count for letters 
counter4 prog 8.3 frequency count for letter pairs 
showoff prog 8.6 displays chess game as it progresses 
quantile proc 8.7. median and other percentiles 
intcount prog 9.2 number tallying 

putwords prog 9.2. writes word list to file 

searcher prog 9.3. binary search for acronyms 
merging part 9.4 two-way merge process 
sherlock prog’ 10.1 logical detective work 

selector prog 10.2 selection of student records 
bigdeal prog. 10.3 _ bridge hands and scoring 
simpson prog’ 11.1 _ integration by simpson’s rule 
dwelling part 11.2 skeleton of estate agent system 
linklist prog 11.3 reverses a character sequence 
chopper prog’ 11.4 _— simplifies boolean expressions 
tapesort prog 13.3 merge sorting of external files 


13.3.3 Sample run 


INPFILE = BOOK.EG 
OUTFILE = BOOK.OUT 


CLASSIC 4-TAPE SORT PROGRAM. 


POSITION OF FIRST CHARACTER IN KEY FIELD? 1 
POSITION OF LAST CHARACTER IN KEY FIELD ?8 


INPUT FILE OF 29 LINES SPLIT INTO 11 SEQUENCES. 
SORT INPROGRESS.... 


PHASE 1 DONE IN 5 STEPS. 
PHASE 2 DONE IN 3 STEPS. 
PHASE 3 DONE IN 2 STEPS. 
PHASE 4 DONE IN 1 STEPS. 


SORTING COMPLETE AFTER 0.3810 SECONDS RUNTIME. 
SUBPROGRAM FREQUENCY STATISTICS: 


2 INIT 1 0.2353 
3 LESS 272 64.0000 
4 DISTRIBU 1 0.2353 
5 COPYOVER 1 0.2353 
6 NEXT 138 32.4706 
7 MERGESUB 11 2.5882 
8 PROFILE 1 0.2353 
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The output appears in upper case since DEC-10 Pascal converts all letters to upper 
case unless specifically told not to. 


AVERAGE PART 6.4 FINDS MEAN AND MAXIMUM 

BIGDEAL PROG 10.3. BRIDGE HANDS AND SCORING 

BIRTHDAY PROG 2.1 CALCULATES WEEK DAY FROM DATE 
CHANCES PROG 7.6 BINOMIAL PROBABILITY CALCULATIONS 
CHOPPER PROG 11.4 SIMPLIFIES BOOLEAN EXPRESSIONS 
COUNTER1 PROG 8.2 FREQUENCY COUNT FORE 

COUNTER2 PROG 8.2 FREQUENCY COUNT FOR VOWELS 
COUNTER3 PROG ' 8.2 FREQUENCY COUNT FOR LETTERS 
COUNTER4 PROG 8.3 FREQUENCY COUNT FOR LETTER PAIRS 
DECLINE PROG 6.7. TABULATES ASSET DEPRECIATION 
DWELLING PART 11.2 SKELETON OF ESTATE AGENT SYSTEM 
EASY PROG 7.1 PRINTS TABLE OF SQUARE ROOTS AND LOGS 
HIGHFACT FUNC 7.5 HIGHEST COMMON FACTOR (RECURSIVE) 
INFLATER PROG 5.3 DEMONSTRATES EFFECT OF INFLATION 
INTCOUNT PROG 9.2 NUMBER TALLYING 

LINKLIST PROG 11.3. REVERSES A CHARACTER SEQUENCE 
MERGING PART 9.4 TWO-WAY MERGE PROCESS 

PIVALUE PROG 6.3 CALCULATES APPROXIMATION TO PI 
PUTWORDS PROG) 9.2. WRITES WORD LIST TO FILE 

QUANTILE PROC’ 8.7. MEDIAN AND OTHER PERCENTILES 
SEARCHER PROG 9.3 BINARY SEARCH FOR ACRONYMS 
SELECTOR PROG 10.2 SELECTION OF STUDENT RECORDS 
SHERLOCK PROG 10.1 LOGICAL DETECTIVE WORK 

SHOWOFF PROG 8.6 DISPLAYS CHESS GAME AS IT PROGRESSES 
SIMPSON PROG 11.1 INTEGRATION BY SIMPSON’S RULE 
SPEEDMPH FUNC 7.4 SPEED IN MPH FROM METRES AND SECONDS 
TAPESORT PROG 13.3. MERGE SORTING OF EXTERNAL FILES 
VERYEASY PROG 7.1. PRINTS TABLE OF SQUARE ROOTS AND LOGS 
YARDMILE PROC 7.2 CONVERSION FROM YARDS TO MILES 


13.4 Discussion 


One thing to notice about TAPESORT is that it could have been a lot shorter. A 


sizeable fraction of the program is concerned with monitoring its own performance. 
The procedures MONITOR, INIT and PROFILE are purely for ‘introspection’: they 
do not help with the sorting. (In fact they: slow it down slightly.) But they are there 
for a good reason. 

Once a program is working (not before!) we may wish to consider ways of 
increasing its efficiency. We might want TAPESORT to run faster, for instance. To 
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do this, we need to know if it has any ‘bottlenecks’ — sections of program where it 
spends an inordinate amount of time. It is futile to ‘optimize’ a procedure that is 
only executed once or twice (unless it is incredibly slow). 

In TAPESORT we find the procedures LESS and NEXT dominate the frequency 
profile. Over 96% of subprogram calls are to one of those two. This means that time 
spent improving any other parts of the program, even MERGESUB, will probably 
be wasted. Such information is extremely valuable: it shows us where to concentrate 
our efforts. Thus 1/100th second saved in LESS is more than twice as beneficial as 
1/10th second saved in MERGESUB. Some systems provide this kind of frequency 
profile automatically; however, even if yours does not, this example shows that it 
is not hard to achieve within standard Pascal. Using the pre-defined function 
CLOCK, which returns the runtime of the current program in milliseconds, it 
would even be possible to work out how long the program spent in each procedure, 
rather than how often it was executed; though in most cases the frequency is a 
good enough guide. 

For serious software a study of the program’s actual behaviour is an indispensable 
prelude to any attempt at improvement. After the program has been ‘tuned’ the 
monitoring routines can be removed, and it will go even faster! 

To turn this sorting program into a really useful piece of software two 
enhancements are still required, which the reader may care to attempt. In the first 
place, it should allow the user at least two ‘key’ fields, and preferably three or four. 
At present it only uses one, indicated by character position. For example, the test 
data (Section 13.3.2) may be considered as having four fields — name, type, section 
number and description. We might very well want to sort by name (minor key) 
within type (major key) so that all items of the same type — programs, procedures, 
functions and so on — come together, but within each type the order is alphabetic. 
As TAPESORT stands this is not possible: we cannot specify that the primary key 
consists of characters 11 to 14 and the secondary key characters 1 to 8, as we 
would like to do. The reader is invited (urged) to devise a good way of implementing 
secondary, or multiple, keys. 

The second improvement necessary before releasing this program on the 
unsuspecting world as a professional software product concerns efficiency rather 
than usability. The results of the monitoring already discussed are relevant here, 
but the biggest improvement in speed is likely to come from including an internal 
sort to cut down the number of merge phases, as mentioned in Section 13.1. 
Because disc access is slow compared to main-memory access this could 
dramatically speed up sorting of large files. 

This entails altering the DISTRIBU procedure so that it does not merely 
distribute subsequences as it finds them, but creates them by reading in, say, a 
few hundred lines and sorting them, then putting the subsequences generated onto 
tapes 1 and 2 in turn as before. (NB The last group is likely to be less than full size.) 

To assist you in this task, and to support the laudable aims of CABS (Campaign 
Against Bubble Sort), here are two very efficient internal sorting procedures — 
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Quicksort, due to Hoare (1962), and Heapsort, due to Floyd (1964). (Yes, all right, 
I confess I have used a Bubble Sort, even published one (Forsyth, 1978); but now 
I know better and am trying to make amends.) 

Quicksort is generally a little faster but Heapsort has a better worst-case 
behaviour. In other words there is no data distribution that can make Heapsort 
perform really badly, though certain pathological, but rare, arrangements of data 
can slow Quicksort down drastically. Also Heapsort is not recursive. In Pascal 
recursion is handled by the system but in other languages, e.g. Fortran, it is 
extremely hard to implement; so Heapsort may be preferable on those grounds. 

More information on internal sorting can be found in books by Wirth and by 
Horowitz and Sahni, among others (Wirth, 1976; Horowitz and Sahni, 1977). 


13.4.1 Quicksort 


The essence of Quicksort is distribution: it is a distributive sort. Distributive 
methods divide the input data into two or more equivalence classes, and then 
subdivide those classes if they are still not small enough. The old mechanical 
card sorters worked on this basis: the cards were divided into ten piles (0 . . 9) 
according to the leading digit then re-input in group order and divided again 
into ten piles on the next digit — and so on. 

The Quicksort algorithm works by picking a particular value, the pivot, and 
then partitioning the vector so that all items to the left of the pivot are less than 
or equal to it and all those on the right are greater, and then applying the same 
method (recursively) to the two partitions . . . and so on until the subgroups are 
trivial (O or 1 members) and do not need to be further subdivided. The nearer 
the pivot is to the actual median value, the better Quicksort performs. 

The procedure QUIX below incorporates a refinement that saves space (on the 
subprogram stack, invisible to the user) by always applying the recursion to the 
smaller of the two partitions first. 


type linebloc = array [1 .. maxlines] of line; 
(%* maxlines is the number of lines in a block, e.g. 200 *) 


procedure quix (var v : linebloc; |r : integer); 
(% quicksort on items in vector v *) 
var i,j,m : integer; item: line; 


begin 
ifr>Ithen (worth sorting, more than 1 item *) 
begin 
m :=(r+1)div2; (midpoint *) 
item :=v[m]; 


i:=1; j:=r;  (* lower, upper limits +) 
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repeat 
while less(v[i] ,item) do i:=i+1; 
(% move up to first item in wrong half *) 
while less(item,v[j]) do j:=j—1; 
(% move down to first item in wrong half +) 
if i <=j then 
begin (% exchange them +) 
item :=v{j]; vj] :=vlil; vi] := item; 
i:=it1; j:=j—t; 
end; 
until j <i; | (# two sweeps have crossed +) 
(* partitioning now complete +) 


if (j-l) < (r—i) then 
begin (sorting smaller subgroup first saves space *) 
quix(v,l,j); (+ sort lower partition +) 
quix(v,i,r); (% sort upper partition +) 
end 
else (% other way round +) 
begin 
quix(v,i,r); (% upper partition +) 
quix(v,l,j); (+ lower partition +) 
end; 
end; 
end; (of quix *) 


13.4.2 Heapsort 


The central idea of Heapsort is to find the largest (smallest) item remaining to be 
sorted and swap it with the final item, repeating the process till the number 
remaining is only 1. It gains over simple selection (Chapter 8), which scans through 
all the remaining data each time to locate the maximum, by using a more 
sophisticated selection method. It plays a kind of tournament to select the largest 
item. 

First the data is made into a ‘heap’. A heap is organized like a binary tree where 
each element, or ‘node’, has at most two descendants. A node without descendants 
is called a ‘leaf’. Such a tree can be stored in a linear vector by the convention that 
for node [i] the two descendants are at positions [i * 2] and [i*2 +1]. It 
becomes a heap when it is arranged such that both descendants are smaller than 
their parent node. This is done by repeated application of procedure SIFT below. 

Once the data is organized as a heap, the greatest item can be picked off — the 
root — and the next greatest found very quickly by, in effect, replaying only those 
parts of the tournament in which the previous winner was involved. This is done by 
a single call of SIFT. (This method is also sometimes known as Treesort3.) 
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procedure heapsort (var v : linebloc; n : integer); 
(%* sorts v into ascending order 1 . . n by heapsort +) 
vari: integer; item: line; 
procedure sift (r,k : integer); 
(%* orders tree where all nodes except root are in place +) 
label 99; 
var j : integer; root: fine; 


begin 
root :=v[r]; (* copy root of tree *) 
while 2%, <=kdo  (# while not at a leaf +) 
begin 
j := 2s; 
ifj<kthen (2 descendants +) 
if less(v[j] ,v[j+1]) then ( out of order #) 
j:=j+1;  (* pick the greater +) 
if not less(root,v[j] ) then goto 99; («time to quit *) 


vir] := vii); 
r:=j;  (% move towards leaves *) 
end; 
99: 
vir] := root; 


end; (xof sift *) 


begin (%* main body of heapsort +) 
(* first order left and right subtrees, i.e. form a heap *) 
i :=ndiv 2; 
while i > 1 do 
begin 
sift(i,n); 
i:=i-—1; 
end; (root now in place *) 
(% repeatedly select largest and place at end +) 
i:=n; 
whilei>1do (sift up and swap +) 
begin 
sift(1,i); | (% pick biggest +) 
item :=v([1]; 
v1] := vi); 
v[i] :=item; (transfer to end *) 
i:=i—1; (one fewer to deal with *) 
end; 
end; (of heapsort *) 


14 
Case study 2 


(Finding the 
shortest path) 


Finding the minimum-cost route through a network is an important problem in 
Artificial Intelligence, Operational Research and many other fields. It is an 
interesting theoretical exercise with significant practical applications. 

Many problems and puzzles can be viewed as an attempt to find the shortest 
route through a network or graph — including freight-planning, Rubik’s cube, 
theorem-proving and the ‘travelling salesman problem’. 

As an example we have chosen a problem that Londoners, Muscovites, New 
Yorkers, Parisians and others face and solve daily: what is the quickest way by 
underground from where I am to where I want to go? 

It is a real-world problem with many interesting irregularities and peculiarities; 
and yet, as we shall see, there is a published algorithm that will solve it, provided we 
can represent the information conveyed by the diagram of lines in some suitable 
form. 


14.2 The urban transit problem 


The London Underground is one of the most complex urban transportation systems 
in the world. Every day well over one million people travel on it. What we want is a 
program which will ask the intending traveller where he or she is and where he or 
she wants to go. It will then work out and display the best way from the given 
starting point to the destination. Although London will be our prime concern, we 
must not be too insular: we will not be satisfied unless the route-planner works also 
for the Paris Metro; then we can have confidence that it could be easily extended to 
deal with Moscow, New York and other subway networks. 

The best route will minimize the cost of travel in some measurable units. For 
simplicity our program will take account of only two measures — the number of 
intermediate stations and the number of line crossings. A line crossing occurs when 
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the passenger has to change from one line to another at an interchange station. It 
always involves a change of trains, a walk and a wait. For example, to get from 
Marble Arch to Green Park one must change from the Central to the Jubilee Line at 
Bond St. 

We must add one proviso to this very general specification: the program must be 
‘user-friendly’. That is to say, when the user types 


BAKER ST for BAKERSTREET or 

ST PAULS for ST.PAUL’S or 
CANNONBURY for CANONBURY or even 
LEDRU ROLLIN for LEDRU-ROLLIN 


the program should not just spit out something like 
#* INCORRECT SPELLING ** 


but should make an attempt to guess what was meant. As we shall see our old friend 
the Binary Search enables us to build in a relatively effective recovery mechanism 
for slight misspellings extremely cheaply. This makes the system more resilient and 
thus more usable. 

We can already sketch in an outline flowchart (Fig. 14.3). 


14.2 The A-star algorithm 


There already exists a search procedure for graph traversal called the Ax (A-star) 
Algorithm. It is applicable to graphs consisting of nodes (e.g. stations) connected 
by arcs (e.g. lines) with a cost attached to each arc. It is optimal in the sense that it 
is guaranteed to find a lowest-cost route and, subject to certain constraints, it never 
examines more nodes than any other procedure using the same information that is 
also guaranteed to find a least-cost route (Nilsson, 1971; Raphael, 1976). We are 
going to use it. 

(Do not be dismayed if you cannot understand it fully at first reading. Neither 
could I. All you need to know to benefit from the rest of the chapter is roughly 
what it does, not exactly how it does it.) 

The Ax algorithm selects one node at a time for examination. This node (n) is 
the one that minimizes a quantity f(n) which is defined as 


f(n) = c(n) + e(n) 


where c(n) is the cost of reaching n by the shortest route yet discovered and e(n) is 
the estimated cost of getting from node n to the final destination. (In other words 
f(n) assesses the cost of the best route from start to goal through node n.) Thus A* 
tries to take into account both past and future traversal costs in selecting from those 
nodes under investigation which to process next. This means that we need some 
method of estimating the distance still to be covered (without doing further search). 
Fortunately such an estimator is to hand in our case, namely the straight-line 
distance between stations as given by their map references. For instance the distance 
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of stations 
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Work out 
best route 
between them 


YES 


Figure 14.3 Outline flowchart 


from Tottenham Court Rd (J6) to St Pauls (L6) is 2. (The number of stops is 
actually 3.) 

It is essential that e(n) should never be an overestimate. If it overestimates the 
future work the algorithm may ignore a node on the true minimum-cost path. 
Therefore if e(n) errs it should err on the side of underestimation. Simple Euclidean 
distance fulfils this condition in all but a few pathological cases on the Underground 
map. (Sorry — no prizes for finding them.) 

Here is the algorithm, adapted from Nilsson (1971). 


1. Put the start node s on a list called OPEN, and compute f(s). 

2. If OPEN is empty, exit with failure; otherwise continue. 

3. Remove from OPEN the node n with smallest f and put it on a list called 
SHUT. (Resolve ties for minimal f arbitrarily, but always in favour of a goal 
node.) 

4. If n is a goal node, exit and trace solution path back via pointers; otherwise 
continue. 
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5. Generate all successors of n, computing for each one (x) 
f(x) = c(n) + cost(n to x) + e(x). 


If there are none, go back to Step 2. 

6. Associate with the successors not already on OPEN or SHUT the f values just 
computed. Put them on OPEN and direct pointers back to n. 

7. Associate with those successors that were already on OPEN or SHUT the f 
values just computed if smaller than the previous f value. Put on OPEN those 
whose f values were thus lowered and redirect their pointers to n. Leave the 
rest alone. 

8. Go back to Step 2. 


14.3. The route finder 


We now know enough about the task to write the outline of our main program in 
Pascal. Stripped of all irrelevancies, it looks like this. 


begin 
readdata(tubefile) ; 
(%* read tube map from file *) 


repeat 
getnames(startoff,goal); 
(* obtain start & destination from user *) 


init(startoff,goal); 
(% clear pointers & open first node +) 


stop := false; 
repeat (heuristic search *) 
here := smallest; 
(% find most promising open node *) 


if (here=startoff) or (here=0) then 
stop :=true (finished *) 
else 
newnodes(here); 
(% generate successors & close current node *) 
until stop; 


if here = 0 then 
writeln(’no way through!’) 
else 
backtrax(here); (+ show the way *) 


until (% user says it’s time to stop *); 
end. 


CASE STUDY 2 (FINDING THE SHORTEST PATH) 201 


This skeleton introduces 5 variables and 6 subprograms, The variables are 


TUBEFILE the station and linkage data on file 


STOP a Boolean flag to terminate search 
GOAL the destination station 
HERE the current node (n) 


STARTOFF the starting point. 
The subprograms are 


READDATA a procedure to read the map into internal storage 

GETNAMES a procedure to obtain the user’s starting point and goal 

INIT procedure to prepare for the search 

SMALLEST the function that finds the least-cost node on OPEN 

NEWNODES the procedure that generates successors and closes the current 
node 

BACKTRAX a procedure to show the route found. 


Notice that Ax enables us to trace a path backwards from goal to starting point. 
To avoid confusing the user the program will search from the destination to the 
start; then BACKTRAX will give the route in the desired direction. 


14.4 Data representation 


We cannot go much further without some attention to data structuring. Indeed a 
data decision has crept in already since by (HERE=0) we are implying that HERE 
is a variable of integer type or integer subrange. (That is how I did it, but it is not 
the only way.) 

The two major data-structure decisions are: how do we represent the stations, 
and how do we manage the lists OPEN and SHUT? 

Though Nilsson calls OPEN a ‘list’ it is evident that the ideal data type for it is 
the set. However, there are 280 stations on the underground network and a few 
dozen British Rail stations we might wish to include as well. I know of no Pascal 
implementation that allows sets with so many members. Regrettably therefore, we 
are thrown back on a less appropriate data structure. We could have OPEN as an 
array of sets; we could represent it as a packed Boolean array — one element per 
station; or we could use a linked list, as the original specification suggested. 

If you look ahead to Section 14.6 you will see that in the end I chose none of 
these, though all are feasible. It turned out to be more convenient to add a Boolean 
field to the record for each station that indicated whether it was open (i.e. in line 
for further investigation). Whether a non-open node had been shut or not could be 
determined by whether it had a backward pointer or not. 

This brings us back to the representation of stations. 
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14.4.1 Internal representation 


The London Underground does not change very fast. From time to time new 
stations are opened (e.g. the extension to Heathrow Airport) and new connections 
are made (e.g. the Jubilee Line), and existing routes may be closed (e.g. the Epping- 
to-Ongar branch). But during a run of the program the network is unchanging. This 
suggests a static data structure — such as a vector of records, with one record per 
station. 

Certain information, however, does vary during program execution: as the search 
fans outwards new nodes are added to the plausible pathways. This suggests a 
hybrid data structure, with a fixed backbone of stations interlinked by a dynamically 
changing system of pointers. 

The static part is easier to deal with first. For each station we need to record 


its name 
its map reference 
the set of lines to which it belongs 

( to enable detection of line crossings ) 
the list of neighbouring stations 
a pointer to the path it is on (if any) 
whether it is open or not 

(i.e. under consideration during search). 


This translates fairly directly into the Pascal record structure below. 


station = record 

name : word; 

horz: char; (map column letter +) 

vert: 1..15; (map row number *) 

lineset : set of line; 

(* lines passing through the station +) 

connects : list; (connecting stations *) 

link : pointer; (%* path for traversal *) 

open : boolean; (whether under consideration *) 
end; (of station *) 


The entire network can be held as 
stations : array [1. .maxstats] of station 


which, to allow rapid search and a degree of error correction, is ordered alpha- 
betically on station name. 

(It should be stressed that this is not the only acceptable data organization. 
There just is not space to discuss the others adequately. The reader is invited to 
devise at least one alternative — perhaps based on lines rather than stations primarily 
— and consider its ramifications.) 
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The fields that need further elaboration are CONNECTS, the list of neighbouring 
stations, and LINK, a pointer to the lowest-cost path through this station, if any. 
These are the two dynamic parts of an otherwise static structure. 

CONNECTS just points to a chain of items of type LISTITEM consisting of the 
station number — i.e. the index in array STATIONS — and a pointer to the next 
item, which is NIL if there are no more. No London station has more than 7 
immediately adjacent stations (Baker St and Kings Cross sharing this record) so it 
would have been feasible to have a fixed-size array [1. .7] of connecting stations; 
but the list representation was felt to be more economical and flexible. (In Paris 
both Montparnasse and Republique have 8 neighbours, and stations in other cities 
may have more: any fixed upper limit is likely either to waste too much space or to 
fail in certain unusual cases.) 

The field LINK is the part that alters as the search progresses. Before going into 
details, let us clarify the discussion by depicting a particular record as it might 
appear before the search commences. It is STATIONS[78] — representing King’s 
Cross & St. Pancras. 


name KINGS CROSS 

horz K 

vert 5 

lineset [CIRCULAR, METROPOL, NORTHERN, 

PICCADILLY, VICTORIA] 

connects 

link NIL 

open FALSE 
(EUSTON) 
(EUSTON SQ) 
(RUSSELL SQ) 
(FARRINGDON) 
(ANGEL) 
(HIGHBURY) 


(CALEDONIAN RD) 


These numbers applied when the data file had 160 stations. 
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Because LINK=NIL this represents a node that has yet to be opened. If, however, 
Kings Cross had been traversed on a search originating, say, from Finsbury Park via 
Highbury towards Euston the link field would have been set to point to the pathway 
record below. 


cost 2 (stops from Finsbury Park) 
estimated 1 (distance to Euston) 
whichway [VICTORIA] (line being followed) 
camefrom 65 (last stop Highbury) 


The OPEN field would be set to TRUE when this link was created. 


14.4.2 The data on file 


This internal representation differs somewhat from the external representation on 
file. It is convenient to hold the station data on a text file which can be amended 

using a standard editor and listed on a printer for inspection. The format adopted 
for each station was: 


station name 

map reference 

set of lines 

one or more neighbouring stations, by name 
dot (fullstop). 


Since stations have a variable number of neighbours the dot is used as terminator. 
The first four entries of file TUBE.MAP are shown below to illustrate the layout. 


KINGS CROSS 

K5 
[CIRCULAR,METROPOL,NORTHERN,PICCADILLY,VICTORIA] 
CALEDONIAN RD LT 

HIGHBURY 

ANGEL 

FARRINGDON 

RUSSELL SO 

EUSTON SQ 

EUSTON 


RUSSELL SOQ 
K5 
[PICCADILLY] 
KINGS CROSS 
HOLBORN 
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HOLBORN 

K6 
[CENTRAL,PICCADILLY] 
RUSSELL SQ 
CHANCERY LANE 
ALDWYCH 

COVENT GARDEN 
TOTTENHAM COURT RD 


ALDWYCH 

K7 
[PICCADILLY] 
HOLBORN 


No punctuation marks or accents (except the apostrophe in French names) were 
used in station names, only letters, and the following abbreviations were systematic- 
ally employed: 


AVE avenue 
RD road 
SQ square 


ST saint, street. 


It is the job of procedure READDATA to read this information and arrange it 
internally as the search program requires. This involves two main changes: (1) the 
data are ordered by name; (2) the connecting stations are referred to by number, 
not name. (Internally the program refers to stations by number to save time and 
space.) 

Task (2) could not be performed until (1) was completed, and this led to a two- 
pass input — as in a compiler where the identifiers are read into a symbol table in 
the first pass and used on the second. 

READDATA turned out to be the biggest procedure in the whole program, 
though conceptually it is quite simple. It uses a rather inefficient method of ordering 
the stations on the first pass: each time one is read its rank position is determined 
by SEEK (the binary search function) and all following stations are moved up one 
space to make room for its insertion. With 220 stations the delay is scarcely 
noticeable (on the DEC System-10) but any reader who wants to make it quicker is 
recommended to alter it so that it reads all the stations in, then sorts them using an 
efficient method such as Quicksort (Section 13.4.1), and then proceeds with the 
second pass as before. MAKEROOM could then be removed, though the program 
will become longer as a result. 

Incidentally READDATA makes use of the fact that sets can be read in DEC 
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System-10 Pascal by READ and READLN. This is not standard. On other systems 
additional procedures for reading and writing sets may be needed. 
The lines recognized (type LINE).are the ten underground lines 


BAKERLOO 

CENTRAL 

CIRCULAR Circle Line 
DISTRICT 

DOCKLAND East London Line 
JUBILEE 

METROPOL Metropolitan Line 
NORTHERN 

PICCADILLY 

VICTORIA 


two railway lines 


RAILWAY 1 G.N. Electrics 
RAILWAY2 Broad St to Richmond Line (North London Line) 


and 
WALKING 


which is a dummy line used to connect Bank with Monument (escalator link) and 
Broad St with Liverpool St. (The Paris lines are numbered, and names NUMERO1 .. 
NUMERO 13 were included for them, as well as three express rail lines that 
interconnect with the Metro.) 


14.5 User interface 


The job of GETNAMES is to convert from a pair of station names typed by the 
user to a pair of numbers indexing those stations in the main data array. If the user 
types an unrecognized name the binary search fails at a certain point, and the 
program asks whether the immediately preceding or immediately following station 
was intended. This frees the user from having to remember the exact spelling down 
to the last hyphen or apostrophe. It catches a surprisingly high proportion of natural 
misspellings, since the first few letter ar normall correc. It also allows the 
knowledgeable user to abbreviate deliberately — e.g. Elephant for Elephant and 
Castle, Liverpool for Liverpool St, Hyde Park for Hyde Park Corner, even Morn for 
Mornington Crescent. 

This had the unexpected side effect of making randomized testing rather easy. 
For instance typing HOME to WORK would be construed as Holloway Rd to 
Woodford and EARTH to MARS as Earls Court to Marylebone if the user assented, 
and the routes worked out accordingly. 
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Notice that SEEK, the Binary Search function, has been streamlined since you 
last met it as LOOK in Chapter 9. There is no harm in learning as you get older. 


14.6 The program itself 


By now you should have had enough preliminary explanation to make sense of the 
program. But first let us see it in action. 


14.6.1 Sample run 


This is a run on the London data, with 160 stations known. (The length of the first 
journey is given as 18 steps, though there are only 10 stops on it. This is because it 
has 2 line changes; and each change receives a penalty equivalent to 4 additional 
stops.) 


[JOURNEYS EXECUTION] 
TUBEFILE = TUBE.MAP 
160 STATIONS READ FROM FILE. 
DO YOU WANT TO SEE STATION DATA? NO 
WHICH STATION ARE YOU AT? HAMPSTEAD 
WHAT IS YOUR DESTINATION? HAMPSTEAD HEATH 
FROM STATION 59 HAMPSTEAD 
TO STATION 60 HAMPSTEAD HEATH 
MAP DISTANCE 0 
IS YOUR JOURNEY REALLY NECESSARY? 


NUMBER OF NODES EXAMINED= 32 


THE BEST ROUTE TAKES 18 STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


HAMPSTEAD [ NORTHERN] 

BELSIZE PARK [ NORTHERN] 

CHALK FARM [ NORTHERN] 

CAMDEN TOWN [ NORTHERN] 

EUSTON [ NORTHERN] +** LINE CHANGE #x 
KINGS CROSS [ VICTORIA] 

HIGHBURY [ VICTORIA]  #%* LINE CHANGE *x* 
CALEDONIAN RD BR [ RAILWAY2] 

CAMDEN RD [ RAILWAY2] 

GOSPEL OAK [ RAILWAY2] 

HAMPSTEAD HEATH [ RAILWAY2] 


HAVE A GOOD TRIP! 
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ANOTHER PAIR OF STATIONS (Y/N) ? Y 

WHICH STATION ARE YOU AT? HYDE PARK 
HYDE PARK NOT KNOWN. 

DO YOU MEAN HYDE PARK CORNER? Y 
WHAT IS YOUR DESTINATION? REGENTS PARK 
FROM STATION 71 HYDE PARK CORNER 
TO STATION 111 REGENTS PARK 

MAP DISTANCE 2 


NUMBER OF NODES EXAMINED= 7 


THE BEST ROUTE TAKES 8 STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


HYDE PARK CORNER [PICCADILLY] 


GREEN PARK [PICCADILLY] 
PICCADILLY CIRCUS [PICCADILLY] +#* LINE CHANGE #« 
OXFORD CIRCUS [ BAKERLOO] 
REGENTS PARK [ BAKERLOO] 


HAVE A GOOD TRIP! 


ANOTHER PAIR OF STATIONS (Y/N) ? Y 

WHICH STATION ARE YOU AT? HEAVEN 
HEAVEN NOT KNOWN. 

DO YOU MEAN HIGH ST KENSINGTON? Y 
WHAT IS YOUR DESTINATION? HELL 

HELL NOT KNOWN. 

DO YOU MEAN HIGH ST KENSINGTON ? N 

DO YOU MEAN HEATHROW CENTRAL? Y 
FROM STATION 64 HIGH ST KENSINGTON 
TO STATION 63 HEATHROW CENTRAL 
MAP DISTANCE 5 


NUMBER OF NODES EXAMINED= 5 


THE BEST ROUTE TAKES 9 STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


HIGH ST KENSINGTON [_ DISTRICT] 
EARLS COURT [ DISTRICT] ** LINE CHANGE x« 
BARONS COURT [PICCADILLY] 
HAMMERSMITH [PICCADILLY] 
ACTON TOWN [PICCADILLY] 


HEATHROW CENTRAL _ [PICCADILLY] 
HAVE A GOOD TRIP! 


CASE STUDY 2 (FINDING THE SHORTEST PATH) 209 


ANOTHER PAIR OF STATIONS (Y/N) ? Y 
WHICH STATION ARE YOU AT? WIMBLEDON 
WHAT IS YOUR DESTINATION? WIMBLEDON 
FROM STATION 156 WIMBLEDON 

TO STATION 156 WIMBLEDON 

MAP DISTANCE 0 

IS YOUR JOURNEY REALLY NECESSARY? 


NUMBER OF NODES EXAMINED= 0 


THE BEST ROUTE TAKES O STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


WIMBLEDON [ DISTRICT] 
HAVE A GOOD TRIP! 

ANOTHER PAIR OF STATIONS (Y/N) ? N 
EXIT 


NB I put in a direct link from Acton Town to Heathrow out of pure laziness. 
There are 8 stations in between that the program does not know about, because I 
did not bother to type them in to the data file. 


14.6.2 Program listing 


program journeys (tubefile); 
(* shortest-route program for London Underground +#) 
(* by Richard Forsyth, PNL, July 1981 +) 


const 
wordsize = 20; maxstats = 300; (* max. no. of stations +) 
dump =true; (% debugging switch *) 
penalty = 4; (* penalty for crossing between lines *) 


type 

line = (bakerloo,central,circular,district,dockland, jubilee, 
metropol,northern, piccadilly,railway 1 ,railway2,victoria, 
walking, 
(%* underground lines for London +) 
numero1,numero2,numero3,numero3b,numero4,numero5,numero6,numero/7, 
numero/7b,numero8,numero9,numero10,numero11,numero12,numero13, 
express 1,express2,express3); 
(* for the Paris Metro #) 

word = packed array [1. .wordsize] of char: 

pointer = “pathway; 
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pathway = record 
cost, estimated : integer; (traversal costs *) 
whichway : set of line; (%* route actually taken +) 
camefrom : integer; (backward link *) 
end; (*of pathway *) 

list = Alistitem; 

listitem = record (* for chaining +) 
id: integer; (%# station number *) 
next : list; (rest of list *) 
end; (of listitem +) 

station = record 
name : word; 
horz: char; (map reference 1 *) 
vert: 1..15; (* map ref. 2 *) 
lineset : set of line; (#lines passing thru this station +) 
connects : list; (* connecting stations *) 
link : pointer; (path for traversal *) 
open: boolean; (#* whether under consideration during search +) 
end; (*of station *) 

var 

stop : boolean; 

response : char; 

dist,here,goal,startoff,ns : integer; 

examined : integer; 

tubefile : text, 

path : pointer; (#start of route *) 

stations : array [1. .maxstats] of station; 
(* main data array *) 

function seek (var w : word; var test : boolean) : integer; 
(% binary search for w in station names *) 
var hi,lo,midpoint : integer; 


begin 
lo:=1; hi:=ns; (number of stations *) 
test := false; 
repeat 


midpoint := (hi+lo) div 2; 
if w > stations [midpoint] .name then 
lo := midpoint + 1 
else if w < stations[midpoint].name then 
hi := midpoint — 1 
else test := true; 
until test or (hi < lo); 
seek := midpoint; 
end; (of seek *) 
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procedure makeroom (p : integer); 
(% makes room for insertion at p *) 
var i : integer; 
( uses ns global *) 


begin 
for i := ns+1 downto pt1 do 
stations[i] :=stations[i—1]; (# move up *) 
end; (%#of makeroom *) 


procedure readdata (var f : text); 
(* reads station information from file f *) 
const stopcode = ’.’; 
var 
w: word; mapt1: char; 
map2,p,g : integer; flag : boolean; 
|: list; Iset : set of line; 


begin (first pass *) 
ns :=0; (no. of stations +) 
reset(f); flag := false; 
while not eof(f) and (ns<maxstats) do 
begin 
readIn(f,w); (%# name *) 
readin(f,map1,map2); (+ map ref. *) 
readin(f,lset); (*line(s) +) 
if ns = 0 then 
p:=1 (first entry *) 
else 
begin 
p := seek (w,flag); 
if flag then 
writeln(‘duplicate entry: ‘,w) (* shouldn’t happen *) 
else 
begin 
if stations[p].name<wthen p:=pt1; 
makeroom(p); (shuffle up to make space *) 
end; 
end; 


(* insertion always in correct rank order +) 
if not flag then 
begin 
ns :=ns +1; 
with stations[p] do 
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begin 
name := w; 
horz := map1; vert := map2; 
lineset := Iset; 
connects := nil; (start with no links *) 
end; (of with *) 
end; 
repeat 
readin(f,w); (skip over connecting stations for now +) 
until w[1] = stopcode; 
end; (of 1st pass *) 
(% second pass *) 
reset(f); 
while not eof(f) do 
begin 
readin (f,w); 
readin(f); readin(f); (not needed this time +) 
p := seek (w,flag); 
if not flag then writeln(’data input error!’); 
(* should never happen *) 
repeat 
readin(f,w); 
if w[1] <> stopcode then 
begin 
g := seek(w, flag); 
if flag then (ok *) 
begin 
new(l); 
I4.id :=q; 14.next := stations[p] connects; 
stations[p] .connects := |; 
end = (* new link attached *) 
else if dump then 
writeln(w,’ not known; from ‘stations [p] .name); 
(% just a warning *) 
end; 
until w[1] = stopcode; 
end; (*of 2nd pass *) 


writeln(ns:7,’ stations read from file.’); 
end; («of readdata *) 


procedure dumpdata; 
(* lists stations as stored for checking *) 
vari: integer; p: list; 
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begin 
fori := 1 tons do 
with stations [i] do 
begin 
writeln(name,i:4); 
writeln(horz,vert:2); 
writeln(lineset); 
p := connects; 
while p< > nil do 
begin 
writeln(stations[p“ .id] name,p4 .id:4); 
p := p“.next; 
end; 
writeln; 
end; (*of with *) 
end; (of dumpdata *) 


function distance(a,b : integer) : integer; 
(* approximate map distance between two stations *) 
var x1,x2,y1,y2 : integer; d: real: 


begin 

x1 := ord(stations [a] .horz); 

x2 := ord(stations[b].horz); 

y1 := stations [a] .vert; 

y2 := stations [b].vert; 

d := sqrt(sqr(x1—x2) + sqr(y1—y2)); 

distance := round(d); (+ must not overestimate +) 
end; (of distance *) 


procedure getnames (var startoff,goal : integer); 
(* gets pair of stations from user *) 


procedure getiname (var n : integer): 
(* obtains stations name and finds its number *) 
var flag : boolean; = y : char 
w : word: 


begin (%getiname *) 
readin; read(w); (%get the word *) 
n := seek(w, flag); 
if not flag then (#*try nearest neighbour +) 
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begin 
writeln(w,’ not known.’); 
write(‘do you mean ’,stations[n].name,’?’); 
readin: read(y); 
ify =’y’ then flag := true 
else (*try next one *) 
if (stations[n].name > w) and (n>1) then 
n:=n—1 (back one +) 
else 
n:=n+1:  (* forward 1 *) 
end; 
if not flag and (n <= ns) then 
begin (*try one more time +) 
write(’do you mean ’,stations[n] .name,’?’); 
readin; read(y); 
ify<>’y’ then 
n:=0; (# indication of failure *) 
end; 
ifn>ns then n:=0; (* failure *) 
if n=O then’ writeln(’please try again.’); 
end; (*of getiname *) 


begin (+ getnames *) 
repeat 
write(’which station are you at? ’); 
get Iname(startoff); 


until startoff <>0O; (%*0O means unknown name *) 
repeat 
write(’what is your destination? ’); 
get 1name(goal); 
until goal < > 0; 
end; (%*of getnames *) 


function smallest : integer; 
(% finds least-cost open node *) 
var i,c,mincost,s : integer; 
node : pointer; 


begin 
s:=0: mincost := maxint; 


fori:=1tonsdo (for all stations *) 
if stations [i].open then 
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begin (worth looking at *) 
node := stations [i].link; 
if node = nil then 
writeln(‘error in smallest function’) 
(%* could it happen ? +) 
else 
begin 
c := node“.cost + node “.estimated; 
if (c< mincost) or (i=startoff) and (c=mincost) then 
begin (%#new minimum +#) 
S:=i;  mincost :=c; 
end; 
end; 
end; 


smallest :=s:; 
end; (of smallest *) 


procedure opennode (from,unto : integer); 
(% opens node from given station *) 
var c,e : integer; p: pointer: 
ways : set of line; 
(* penalty,startoff are global *) 


begin 
p := stations[from].link; 
c := p“.cost + 1; 
(* distance already covered +) 
e := distance(unto,startoff): 
(* distance still to do *) 
ways := stations[unto].lineset * p“.whichway; 
if ways = [] then (end of the line! +) 
begin 
c:=c+penalty; (%* penalty for line change +) 
ways := stations[unto].lineset * stations [from] .lineset: 
end; 
if ways * stations [startoff] .lineset = [] then 
e:=e+penalty; (#at least 1 more change *) 


p := stations[unto].link; 
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if p=nilthen (* normal case *) 
begin (open it afresh *) 
new(p); 
stations[unto].open := true; 
p4.cost :=c; pA.estimated := e; 
p4 .whichway := ways; 
p4.camefrom := from; 
stations[unto].link :=p; (%* new link attached +) 
end 
else (station already visited *) 
if (cte) < (p4.cost+p“ estimated) then 
(% new route is better than old one *) 
begin (%* open it up *) 
p’.cost :=c; p.estimated :=e; 
p’.camefrom := from; (last stop may be different +) 
p4.whichway := ways; 
stations[unto].open := true; 
end; 
(* otherwise ignore it *) 
end; (of opennode +) 


procedure newnodes (s : integer); 
(%* generates all 1-step links from station s *) 
var |: list; p: pointer; 
( examined is global +) 
begin 
stations[s].open := false; 
examined := examined + 1, 
p := stations[s].link; 
if dump then (% see how search is progressing *) 
begin 
write(‘closing node ‘,stations(s].name); 
writeln(’ cost = ’,p4.cost+p“.estimated:2,’ ’,p4.whichway); 
end; 
| := stations[s] connects; 
while |< > nil do 
begin 
if p’.camefrom <> 14.id then (#don’t return to self *) 
opennode(s,|“.id); 
| := 14. next; 
end; 
end; (%# of newnodes *) 
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procedure backtrax (here : integer); 
(% traces path back from here *) 
var p : pointer; ways : set of line; 


begin 
writeln; 
p := stations [here].link; 
ways := p4.whichway; 
writeln(‘the best route takes ',p“.cost:4,’ steps.’); 
writeln(‘the optimal route is as follows:’); 
writeln; 
while p< > nil do 
begin 

write(stations[here].name,’ ‘,ways); 

here := p4.camefrom; (# previous station *) 

if here = 0 then 

p:=nil (no more to do ¥) 


else 
begin 
ways := ways * p4.whichway; 
if ways = [ ] then 
begin 
ways := p” whichway; 
write(’ ** line change **’); 
end; 
writeln; 
= stations[here].link; (# next stop *) 
end; 
end; 


writeln; writeln; writeln(’have a good trip!’); 
end; (of backtrax *) 


procedure init (startoff,goal : integer); 
(* prepares for new search from goal *) 
vars : integer; 
(* also uses globals examined,ns,stations,path +) 


begin 
fors := 1tonsdo 
begin 
stations[s].link := nil; 
stations[s].open := false; 
end; 
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if path<>nilthen dispose(path); 
(* clear storage from last time *) 
examined :=Q; (number of nodes examined *) 


(* now open the first node *) 
stations [goal] .open := true; 
new(path); 
with path4 do 
begin 
cost := 0; 
estimated := distance(goal,startoff); 
whichway := stations [goal] .lineset; 
camefrom := 0; 
end; 
stations [goal] link := path; 
end; (of init *) 


begin (+ main program *) 


readdata(tubefile); 
(* read tube map from file into array of stations *) 
path :=nil; (no route established yet *) 
write(‘do you want to see station data? ’); 
readin; (readin first on DEC-10 for terminal input +) 
read (response); 
if response = ‘y’ then (dump stations for checking *) 
dumpdata; 


repeat (% main loop *) 
getnames(startoff,goal); 
(* get start and destination from user *) 
writeln(‘from station ‘,startoff:4,stations[startoff] .name:wordsize+2); 
writeln(‘to station ‘,goal:4,stations [goal] .name:wordsize+2); 
dist := distance(startoff, goal); 
writeln(’map distance ’,dist:4); 
if dist < 1 then 
writeln(‘ is your journey really necessary?’); 
writeln; (* this part just for testing *) 


init(startoff,goal); (+ prepare for action *) 


stop := false; 
repeat (%* heuristic search *) 
here := smallest; 
(%* most promising open node *) 
if (here = startoff) or (here = 0) then 
stop :=true (%* finished *) 
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else 
newnodes (here); 
(+ generate successors and close current node *) 
until stop; 


writeln(‘number of nodes examined = ’,examined:4); 
if here<.>OQOthen (show the route *) 
backtrax(here) 
(* path generated backwards from goal, 
listed forwards from starting point *) 
else (failure *) 
writeln(‘no way found!’); 


(%* ask user if any more wanted *) 
writeln; 
write(‘another pair of stations (y/n) ? ‘); 
readin; read(response); 

until response =‘n’; (% enough done #) 


end. 


14.6.3 Run with trace dump 


There is a constant DUMP which is FALSE in the production version of the 
program. When it is TRUE, which is useful during testing and debugging, the 
program prints the name of each node closed during the search, together with its 
f value, i.e. estimated total cost of the best path through that node. 

Here is a run with DUMP=TRUE which gives a good idea of how the search fans 
out along different plausible routes. In this case there are two equally good routes, 
and both are followed nearly to the end. 


[JOURNEYS EXECUTION] 
TUBEFILE = TUBE.MAP 


OLYMPIA NOT KNOWN; FROM EARLS COURT 
WEMBLEY CENTRAL NOT KNOWN; FROM STONEBRIDGE PARK 
ACTON CENTRAL NOT KNOWN; FROM WILLESDEN JUNCTION 
EAST ACTON NOT KNOWN; FROM WHITE CITY 

BOW RD NOT KNOWN; FROM MILE END 

WILLESDEN GREEN NOT KNOWN; FROM KILBURN 

NEASDEN NOT KNOWN; FROM WEMBLEY PARK 
GOLDERS GREEN NOT KNOWN; FROM HAMPSTEAD 

EAST FINCHLEY NOT KNOWN; FROM HIGHGATE 

BOUNDS GREEN NOT KNOWN; FROM WOOD GREEN LT 


160 STATIONS READ FROM FILE. 
DO YOU WANT TO SEE STATION DATA? N 
WHICH STATION ARE YOU AT? NOTTING HILL GATE 
WHAT IS YOUR DESTINATION? LIVERPOOL ST 
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FROM STATION — 100 
TO STATION 87 
MAP DISTANCE 7 


CLOSING NODE LIVERPOOL ST 


LIVERPOOL ST 


NOTTING HILL GATE 


CIRCULAR, METROPOL, WALKING] 


CLOSING NODE BANK 
WALKING] 
CLOSING NODE MOORGATE 
METROPOL] 
CLOSING NODE ALDGATE 
METROPOL] 
CLOSING NODE BARBICAN 
METROPOL] 
CLOSING NODE FARRINGDON 
METROPOL] 
CLOSING NODE ST PAULS 
CLOSING NODE CHANCERY LANE 
CLOSING NODE BETHNAL GREEN 
CLOSING NODE HOLBORN 
CLOSING NODE KINGS CROSS 
METROPOL] 
CLOSING NODE EUSTON SQ 
METROPOL] 
CLOSING NODE GREAT PORTLAND ST 
METROPOL] 
CLOSING NODE BAKER ST 
METROPOL] 
CLOSING NODE EDGWARE RD 
METROPOL] 
CLOSING NODE TOTTENHAM COURT RD 
CLOSING NODE OXFORD CIRCUS 
CLOSING NODE BOND ST 
CLOSING NODE MARBLE ARCH 
CLOSING NODE TOWER HILL 
CLOSING NODE MONUMENT 
CLOSING NODE CANNON ST 
CLOSING NODE LANCASTER GATE 
CLOSING NODE MANSION HOUSE 
CLOSING NODE MILE END 
CLOSING NODE PADDINGTON 
METROPOL] 
CLOSING NODE QUEENSWAY 
NUMBER OF NODES EXAMINED = 27 


COST= 7 
COST= 7 
COST= 7 
COST= 8 
COST= 8 
COST= 8 
COST= 8 
COST= 8 
COST= 9 
COST= 9 
COST= 9 
COST= 9 
COST= 8 
COST= 9 
COST= 9 
COST= 9 
COST= 8 
COST= 9 
COST= 9 
COST= 9 
COST= 9 
COST= 9 
COST = 10 
COST = 10 
COST = 10 
COST = 10 
COST = 10 


[ 
[ 


CENTRAL, 


CENTRAL, 


[ CIRCULAR, 


[ CIRCULAR, 


[ CIRCULAR, 


[ CIRCULAR, 


re PS PS P= 


[ 
[ 
[ 
[ 


= =] | == 


CENTRAL] 
CENTRAL] 
CENTRAL] 
CENTRAL] 


CIRCULAR, 
CIRCULAR, 
CIRCULAR, 
CIRCULAR, 


CIRCULAR, 


CENTRAL] 
CENTRAL] 
CENTRAL] 
CENTRAL] 


[ CIRCULAR] 
[ CIRCULAR] 
[ CIRCULAR] 


[ 


CENTRAL] 


[ CIRCULAR] 


[ 


CENTRAL] 


[ CIRCULAR, 


[ CENTRAL] 
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THE BEST ROUTE TAKES 11 STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


NOTTING HILL GATE [ CENTRAL] 


QUEENSWAY [ CENTRAL] 
LANCASTER GATE [ CENTRAL] 
MARBLE ARCH [ CENTRAL] 
BOND ST [ CENTRAL] 
OXFORD CIRCUS ~ [ CENTRAL] 
TOTTENHAM COURT RD [ CENTRAL] 
HOLBORN [ CENTRAL] 
CHANCERY LANE [ CENTRAL] 
ST PAULS [ CENTRAL] 
BANK [ CENTRAL] 
LIVERPOOL ST [ CENTRAL] 


HAVE A GOOD TRIP! 
ANOTHER PAIR OF STATIONS (Y/N) ?.N 
EXIT 


First the program prints out ten connecting stations which have no data record. 
This shows the various places where lines are incomplete. It is also useful for 
catching typing errors in the data. 

Then the search begins, starting at Liverpool St, the destination. If you have a 
map handy you can follow the program as it pushes out along the two most 
competitive routes — one on the northern half of the Circle Line via Kings Cross and 
Baker St and the other on the Central Line via Oxford Circus. A third route on the 
southern part of the Circle line gets as far as Mansion House. The only genuine red 
herring is the extension backwards through Bethnal Green and Mile End: but this 
only amounts to two spurious nodes, and the program cannot be sure that the 
Central Line does not loop around via Mile End to Notting Hill Gate. (In certain 
circumstances it has to go back before it can advance.) 

So, of the 160 nodes existing, 27 are examined: 11 are on one minimum-cost 
path, 9 are on another equally good route, 5 are on a more roundabout way, and 
only two are dead ends. Not bad for a mere machine. 


14.6.4 Dans le Metro 


Did you know you can go by rail from Rome to Stalingrad in ten minutes without 
even changing trains? Yes folks! It can be done on the Paris Metro, Line 2. 
For the more cosmopolitan reader, here is a chance to sit back and enjoy a 


couple of imaginary tours through the evocatively named stations of the Parisian 
Metro. 


This run took place when the data file contained details of 255 stations, enough 
for all the stations within the arrondissements of Paris and a few others further out. 
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To show the data format a listing of stations has been requested, but for brevity a 
large chunk of the listing is cut. 


[JOURNEYS EXECUTION] 
TUBEFILE = METRO 
255 STATIONS READ FROM FILE. 
DO YOU WANT TO SEE STATION DATA? Y 


4 SEPTEMBRE 1 
J6 

[ NUMERO3] 

OPERA 150 
BOURSE 27 
ABBESSES 2 
J4 

[ NUMERO12] 

PIGALLE 162 
LAMARCK 111 


[248 stations omitted here] 


VILLIERS 251 
G5 

[ NUMERO2, NUMERO3] 
MALESHERBES 127 
MONCEAU 138 
EUROPE 73 
ROME 211 
VINCENNES 252 
R 10 

[ EXPRESS3] 

NATION 144 
FONTENAY SOUS BOIS 79 
VOLONTAIRES 253 


F 11 
[ NUMERO12] 


VAUGIRARD 248 
PASTEUR 155 
VOLTAIRE 254 
08 

[ NUMEROQ] 

ST AMBROISE 219 


CHARONNE 41 
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WAGRAM 255 
F 4 

[ NUMERO3] 

PEREIRE 158 
MALESHERBES 127 


WHICH STATION ARE YOU AT? EUROPE 
WHAT IS YOUR DESTINATION? ARGENTINE 
FROM STATION 73 EUROPE 

TO STATION 7 ARGENTINE 

MAP DISTANCE 4 


NUMBER OF NODES EXAMINED = 19 


THE BEST ROUTE TAKES 14 STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


EUROPE [ NUMERO3] 
VILLIERS [ NUMERO3] #** LINE CHANGE ## 
MONCEAU [ NUMERO2] 
COURCELLES [ NUMEROZ2] 
TERNES [ NUMEROZ2] 
ETOILE [ NUMERO2] #* LINE CHANGE *« 
ARGENTINE [ NUMERO1] 


HAVE A GOOD TRIP! 


ANOTHER PAIR OF STATIONS (Y/N) ? Y 


WHICH STATION ARE YOU AT? VICTOR HUGO 
WHAT IS YOUR DESTINATION? ALEXANDRE DUMAS 
FROM STATION 250 VICTOR HUGO 

TO STATION 4 ALEXANDRE DUMAS 

MAP DISTANCE 12 


NUMBER OF NODES EXAMINED= _ 30 


THE BEST ROUTE TAKES 21 STEPS. 
THE OPTIMAL ROUTE IS AS FOLLOWS: 


VICTOR HUGO [ NUMERO2] 
ETOILE [ NUMERO2] 
TERNES [ NUMEROQ2] 
COURCELLES [ NUMERO2] 
MONCEAU [ NUMERO2] 
VILLIERS [ NUMERO2] 
ROME [ NUMEROZ2] 


PLACE CLICHY [ NUMERO2] 
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BLANCHE [ NUMERO2] 
PIGALLE [ NUMERO2] 
ANVERS [ NUMERO2] 
BARBES ROCHECHOUART  [ NUMERO2] 
LA CHAPELLE [ NUMERO2] 
STALINGRAD [ NUMERO2] 
JAURES [ NUMERO2] 
COLONEL FABIEN [ NUMERO2] 
BELLEVILLE [ NUMERO2] 
COURONNES [ NUMERO2] 
MENILMONTANT [ NUMERO2] 
PERE LACHAISE [ NUMERO2] 
PHILIPPE AUGUSTE [ NUMERO2] 
ALEXANDRE DUMAS [ NUMERO2] 


HAVE A GOOD TRIP! 
ANOTHER PAIR OF STATIONS (Y/N) ?N 
EXIT 


14.7 Concluding remarks 


I hope that at least some readers will take the trouble to understand, and if possible 
implement, this program. It does a useful job quite well. For other subway systems 
it only needs a different database, and a new set of line names. 

Having said that, it is fair to point out firstly that it is far from perfect (even 
though based on an algorithm which is in a sense ‘optimal’ it is susceptible to 
improvement) and secondly that it has a history. Let us take the second point first. 


14.7.1 Implementation schedule 
Presenting an example in a book can give a misleading impression of stability. What 
is published seems fixed and immutable, but in reality it is a snapshot of an evolving 
system at a certain stage in its evolution. It has a past and, hopefully, a future — 
when perhaps you, the reader, will extend and enhance it. 

Its past will help you understand its present. This program was not entered all at 
once and then run; it was implemented in four distinct phases. 


Day 1: read in the data and display it; 
Day 2: get names from user and identify them correctly; 
Day 3: plan the route and show it; 


Days 3 and 5: __ get the bloody thing working. 


If you look at the components of the program you will find them very nearly in 
chronological order — first the data declarations, then the functions and procedures 
as they were typed in and tested, lastly the main program as frozen for publication. 
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On Day 1 the main program consisted essentially of what are now its first ten 
lines — just enough to read in and dump out the data from file. This involved 
subprograms SEEK, MAKEROOM, READDATA and DUMPDATA. Of course 
DUMPDATA is superfluous to the ultimate objective, but would you go ahead 
without knowing it could read its data correctly? 

Day 2 allowed the testing of routines DISTANCE, GETINAME and GETNAMES. 
This confirmed that using the two closest names after a failure of the binary search 
was, as had been hoped, a tolerably good way of dealing with natural typing errors. 
It also revealed that the letter I was missing from the tube map diagram: column H 
was followed by J. This meant that the distance estimate was occasionally an 
overestimate of the number of stops between two stations. Some overestimates 
could be fixed by treating stations in the left half of a J square as being in a 
fictitious I square. Other than that, distance seemed a satisfactory estimator. 

Only on Day 3, with the foundations well prepared, was the search process 
introduced: subprograms SMALLEST, OPENNODE, NEWNODES, BACKTRAX, 
and INIT were added. The fact that programs can and should be built in stages is an 
important lesson for the novice — i.e. anyone with less than two years’ experience. 
It can make the difference between getting a system working in a week and taking a 
whole month. 

At first, amazingly, it appeared that everything worked a treat first time around. 
The program found sensible routes and displayed them correctly. The air was filled 
with the author’s shrieks of pleasure and astonishment. But in fact two more days 
were to elapse before JOURNEYS was ready for public scrutiny. 

The crux of the matter was determining line crossings. There is a penalty, 
currently set at 4, for crossing lines. This means that it costs as much as passing 
4 extra stations to change lines (circa 10 minutes). The problem was defining the 
notion of changing lines, which is rather obvious to humans from the map, precisely 
enough for the computer in terms of the data structure used. 

The first approximation assumed that, in going from station A via B to C, if A 
and C shared a common line then no change was needed. But a glance at Fig. 14.4 
overleaf refutes this. 

If we go from Piccadilly Circus to Green Park to Oxford Circus we must change 
lines; yet the two end points, Piccadilly Circus and Oxford Circus, share a common 
line, the Bakerloo. The program did not realize that a change was involved in such 
cases and produced some weird routes. 

The second approximation considered three consecutive stations and assumed 
that if all three shared a common line then no change was necessary. This worked 
better, but failed where two or three lines ran in parallel for a while. For example, 
in travelling from Baker St to Earls Court at least one change must be made, from 
Circle or Metropolitan to the District Line. Yet at no stage on the route are there 3 
consecutive stations which do not share a line. So again the program miscalculated. 

It was only at the third attempt that a reasonably satisfactory definition was 
formulated. As a path was extended it was assumed that only those lines common 
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Bakerloo (B) ‘, Victoria (V) 


Piccadilly (P) 


GREEN PARK 


HYDE PARK 


CHARING 
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Line Line 


Figure 14.4 Changing lines 


to all stations on it since the last line change were being used — normally a single 
line, but sometimes two or three where they run alongside. Then when a successor 
station had no lines in common with the path being followed a change could be 
asserted and the new value of the field WHICHWAY initialized with the line or lines 
shared by the successor and its immediate predecessor. What this means in Pascal 
terms can be seen at the start of procedure OPENNODE, where WHICHWAY gives 
the set of lines followed so far and WAYS is the line or lines used on the next step. 

A bonus from getting the definition of line crossing more or less right was that 
the route displayed by BACKTRAX was much less cluttered. 

Even this definition, however, still has loopholes: for example it ignores branch 
lines, where a single line forks. At such points it may be necessary to change trains. 


14.7.2 Inadequacies and improvements 


A major inadequacy has just been alluded to. The program is not aware of branch 
lines. It also makes mistakes where two lines run parallel but one has fewer stops 
than the other. 

A simple solution to the branch-line problem would be to treat the branches as 
distinct lines. Then, for instance, Baker St would lie on three versions of the 
Metropolitan Line (see Fig. 14.5). 

The difficulty here is that we know that, in going from Finchley Rd to Edgware 
Rd or vice versa we must always change, while in passing between Great Portland St 
and Finchley Rd or Edgware Rd we may not have to. The program would treat all 
changes as equally costly. 
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FINCHLEY RD 


met 2 


EDGWARE RD GREAT PORTLAND ST 


BAKER ST 
Figure 14.5 Three-way split 


In Paris this problem only arises at one station, the aptly named La Fourche on 
(unlucky) line 13. In other cases where a line forks two names are used for the two 
branches. For example at Louis Blanc line 7 splits into lines 7 and 7b, indicating 
which branch requires a change. But then the French are reputed to be a very logical 
nation. 

For London I would settle for an extra table of fork-points, which would include 
special details on such stations as Baker St, Camden Town, Earls Court and the like. 

Another deficiency can be illustrated at Baker St (Fig. 14.6). 

If one goes from Bond St to Finchley Rd through Baker St one must be travelling on 
the Jubilee Line, not the Metropolitan. But the program as it stands assumes that 
because Finchley Rd is directly linked tu Baker St, and because Baker St, Bond St 
and Finchley Rd are all on the Jubilee Line, only two stops are involved. It 
ignores St Johns Wood and Swiss Cottage completely. 

A more serious problem, this time caused by the A* algorithm itself, arises in 
Paris (Fig. 14.7). 


Metropolitan . . 
Line (M) Jubilee Line (J) 
ST JOHNS WOOD 
GREAT 
BAKER @ PORTLAND ST 


(Metropolitan Line) 


(Jubilee Line) 
Figure 14.6 The Baker St irregulars 
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Line 1 CONCORDE 


Line 1 
(6 stations) 


REUILLY 
DIDEROT 


MONTGALLET 


Figure 14.7. Storming the Bastille 


The best way from Montgallet to Concorde is to take line 8 to Reuilly Diderot 
then change to line 1. This takes 10 stops plus one change (14 units). But the 
program prefers the route on line 8 all the way, taking 15 stops. This is not a bad 
route, but it exhibits a bug in the algorithm. When the search reaches Bastille by 
line 8 it has taken 4 stops. When it later reaches Bastille by the alternative route, 
having changed at Reuilly Diderot to line 1, it has already paid the penalty for a 
line change, so the net cost is 7. Therefore this avenue of investigation is abandoned 
and the node is not reopened, even though it would lead to a cheaper route. 

The real problem is that the A* algorithm assumes that the best way to get 
from X via Y to Z is to take the best way from X to Y then the best way from 
Y to Z. This is a reasonable presumption, but it neglects what might be called 
the ‘momentum effect’; the best way from where you are to where you are 
going depends on how you got there. A cost that has been incurred already, like a 
line change, may reap dividends in the future. 

Do not take away from this critique the impression that the program is of no 
use: it almost always picks an optimal route, and always picks a good one. The 
criticisms merely demonstrate that the application of a standard algorithm to a 
real-life problem tends to unearth one or two annoying inconsistencies. They can 
all be ironed out in the end — at a price. The price is usually a more complex 
program, a more complex data structure or both. It all depends what you are 
prepared to pay. (Sometimes a flash of insight will reveal a way of improving a 
program and simplifying it too, but we cannot all be geniuses.) 

If we had been commissioned to create the ultimate urban transit route planner 
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with no expense spared we might take into account 


which stations are closed at certain times, 

the frequency of trains on different lines, 
which interchanges are easy and which are hard, 
how far apart adjacent stations are, 


and more besides. You might even care to contemplate (for the International Year 
of Disabled People) a route finder for people in wheelchairs that only uses 
interchanges which do not involve stairs or escalators. Changing from the Northern 
Line southbound to the southbound Victoria Line at Euston would be all right 
(straight across platforms) but changing from a northbound to a southbound 
platform would not be. The program would come up with some really curious 
routes, since the only way between certain stations might involve a long detour to 
somewhere that a change could be made without climbing any stairs. 

The information is obtainable: the world awaits its use. (But how is the disabled 
passenger to get down to the platform in the first place?) 

Three other improvements are worth mentioning briefly before concluding: 
improved data format; displaying the second-best route; and graphic presentation. 

The use of type LINE leads to rigidity: we need unique names for all lines on all 
subway systems. This is impracticable. It would be better to read in the line names 
with the data, and associate them with numbers internally. It would also be better 
not merely to indicate that a station can be reached directly from another station 
but to associate the line used with the connection. The lines are more properly 
features of the linkages between stations than the stations themselves. Then if two 
stations were linked by two separate lines, two separate connections would be stored. 

This would eliminate the station-skipping problem between Baker St and 
Finchley Rd and at other places, but it would make typing the data file more 
arduous. Therefore the program should take over some of the work. Instead of 
typing each station with all its links etc. the user could type the sequence of 
neighbouring stations that constitute a line and let the program work out the 
linkages for itself. This would be more natural. 

A second improvement would be for the program to work out and shov, if 
requested, the next-best route. One way to do this would be to break the final link 
of the best route and increase the costs of the penultimate node by some large 
arbitrary amount, forcing the program to search for alternative pathways, the best 
of which could then be displayed as before. 

Finally, the system cries out for a really good graphic presentation (in colour). 
This would be an excellent way of showing the search in progress and of presenting 
the final route. At the moment the map reference for each station is too imprecise 
to assign it a unique screen location, but a little amateur cartography would soon 
fix that. An eye-catching screen-based display would require as much work again as 
the original program, but it could turn it from an academic curiosity into a saleable 
product. 


PART THREE 


Pascal at Play 


‘God does not play dice with the universe.’ Albert Einstein 


15 
Case study 3 


(Football 
Simulation) 


We now turn to the lighter side of computing, a fascinating subject in its own right. 


15.1 Computer games 


When computers cost millions of dollars, weighed thousands of pounds and 
consumed hundreds of kilowatts the idea of using one purely for entertainment 
was held to be sacrilegious. Nowadays one can hardly sit in a cafe or pub 
undisturbed by the sounds of simulated spacecraft being destroyed. The recent 
proliferation of computerized arcade games can leave no one in any doubt that 
computer technology plays a major role in the entertainment industry and that 
entertainment plays a major role in the computer industry. 

There are many ways of having fun with computers (controlling homemade 
robots, drawing pictures, making music, writing poems and so forth) but computer 
games must take the centre stage. There is an immense variety of them: we will 
adopt a fourfold classification — 


games of chance, 
fantasy games, 
games of pure skill, 
video games. 


Most card and dice games fall into the first category, including those with a 
large element of skill as well as luck, such as Backgammon and Poker. To 
discourage gambling, and to endorse Einstein’s view that God is not a 
random-number generator, we will not emphasize this category; though you 
might be interested to know that a computer program has beaten the reigning 
world Backgammon champion (Berliner, 1980). 

The second class, fantasy games, includes such favourites as Adventure, Star 
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Trek, Warlock’s Castle and Zork. As personal computers have become more 
commonplace, there has been a steady if unspectacular spread of such games. 
Today, if you own one of the more common brands of microcomputer, you 
can buy for a few pounds one from hundreds of tapes or discs each containing 
a novel game of this type. They are essentially automated fairy-tales or 
science-fiction stories. They fulfil much the same need and sell to much the 
same audience as superhero comics or TV soap operas, with one important 
difference: you can play a starring part in the story as it unfolds. A good 
quality fantasy game would actually be too complex to be a suitable example 
for this book, so we will pay little further attention to them. (Writing one 
effectively means writing a novel and a computer program at the same time and 
coordinating the two — no mean undertaking.) The interested reader is referred 
to the special issue of Byte magazine on the subject, published in December 1980. 

Thirdly there are the classic games of intellectual skill such as Chess, Draughts, 
Go and Othello. This is the most traditional area of computer game-playing. In 
fact a chess automaton was constructed as long ago as 1890 by the Spaniard 
Leonardo Torres y Quevedo (Bell, 1978). Even Babbage toyed with the idea 
(if you will pardon the phrase) of making his Analytical Engine play Noughts 
and Crosses, charging admission for people to watch it, and thus helping to fund 
the project. Automating games of mental skill has been popular for a long time 
because it does not require advanced graphic facilities (a teletype will do) and 
because the early thinkers on artificial intelligence — among them the eminent 
mathematicians Alan Turing and Claud Shannon — saw a successful chess program 
as a touchstone of machine cognition. We will postpone further discussion of this 
interesting topic till Chapter 16, where a Go-Moku program is presented. 

The last category is typified by Space Invaders and its many derivatives. The 
mushroom growth of such games had to await the development of cheap colour 
graphic displays and the microprocessors to drive them, but few people predicted 
the use to which this technology would be put. Most of these games appeal to 
our baser natures: the human players hone their dexterity to a high pitch while 
vicariously satisfying their killer instincts. Children often find them addictive, and 
the skill exhibited by expert players (usually under 18) has to be seen to be 
believed. There have been reports on the one hand from Japan of muggings and 
even murder committed to obtain money to play arcade games and on the other 
hand from Nottingham of a spastic boy whose coordination and speech improved 
wonderfully as a result of practice at Space Invaders. For good or ill, this aspect of 
computer usage is certainly a significant social phenomenon. 

The example program in this chapter is a video game, though a very peaceful one 
with a rather low level of user interaction. 


15.2 Soccer simulation 


Our example program shows how Pascal may be used to simulate a five-a-side 
football game. (Having the full 11 players per side would make the screen 
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overcrowded.) It brings us up against two problems which lie outside of Pascal — 
timing and screen-handling. 

Timing considerations are important for two reasons: (1) the game should be 
played neither too fast nor too slowly to present a watchable spectacle; (2) if the 
user is to interact with the game, its speed must be comparable to his or hers. 
Making the game run at an acceptable speed is largely a matter of trial and error. 
The procedure WAIT allows insertion of a variable delay, and this can be used to 
adjust the game to your own rate. 

Screen-handling is a problem because there is no universal standard. Video 
displays may be memory-mapped or not, they may provide low, medium or high 
resolution graphics and they may or may not have colour. In addition sizes vary 
greatly: common heights are 16, 20, 24, 25 and 32 lines, common widths are 32, 
40, 64, 80 and 132 columns, and you can find almost any combination of the two 
— sometimes switchable on the one machine! 

The program here runs on the RML 380Z, which has a rather small screen of 
24 lines by 40 columns. It is designed to be portable to almost any other screen. 
For this reason it does not use colour or full graphics, merely a few cursor-control 
facilities likely to be available on any visual display unit. Also the constants 
DEEP=20 and WIDE=32 which define the size of the pitch can be altered for 
other machines. The playing area can be scaled up or down to any reasonable size 
by altering these, provided that the screen has at least DEEP+4 lines of WIDE+4 
columns. 

The program requires no special graphics hardware or software (though it could 
be modified to make use of them). All it uses are four special control characters, 
which are localized in procedures HOME, CLRS and PLOT. The codes for these on 
the 380Z are listed below. 


Clear screen chr(12) 
Home cursor (top left) chr(29) 
Move down chr(10) 
Move right chr(24) 


Almost all screens provide these minimal capabilities. (Notice especially that there 
are no PEEKs or POKEs of arbitrary memory locations.) 

The PLOT procedure puts a single character at any given screen location by 
homing the cursor, then moving down the requisite vertical distance and moving 
across the requisite horizontal distance. (It works within the bounds 0 . . DEEP+3 
by 0. . WIDE+3.) This is not the fastest method, but the most portable. If your 
screen provides direct cursor addressing, only PLOT need be modified to make use 
of it. All other screen addressing is carried out by calling PLOT. 


15.2.1 Plays and players 
The playing area is divided into quarters, Fig. 15.1. 
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Goal 1 * A Goal 


DEEP 
1 Q1 Q2 Q3 WIDE 


Figure 15.1 The pitch 


Each side has one goalkeeper, two defenders and two attackers. Different 
players are restricted to different zones. Let us call the team playing from left 
to right the home side. The home side’s goalkeeper is restricted horizontally to 
the zone from 1 to Q1; the home defenders must stay between 1 and Q3; and the 
attackers between Q2 and WIDE. The away side’s zones are mirror images: away 
attackers have the same range as home defenders, and vice versa. All players can 
move vertically from 1 to DEEP. 

When the game begins all players are placed (by procedure KICKOFF) in 
standard starting positions. 

All movement is carried out by procedure MOVE, which itself calls PLOT. 
MOVE will move any player or the ball to any given row and column. The ball 
is allowed to move twice as far in one go as a player. If the ball or player is 
attempting to move too far or to go outside the permitted zone MOVE enforces 
the limit. Then it effects the transition in 8 steps, each moving one eighth of the 
distance. This gives an impression of motion: it is somewhat jerky, but it is 
better to make the move in several small steps than all at once. 


15.2.2 The play’s the thing 

We are attempting to approximate a situation where many players move around 
simultaneously by a succession of discrete events. How this is done can best be 
seen from a simplified outline of the main program and the two top-level routines 
ATTACKS and DEFENDS. 
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program football; 
type players = (* suitable integer subrange +*); 


var goal : boolean; 
half: 1..2; 
p,withball : players; 


(% other declarations and procedures omitted *) 


procedure attacks (p : players); 
(* p makes an attacking move +) 


begin 
if p=withball then (+ p has possession *) 
if neargoal(p) then 
if nearball(p) then 
shot(p) (pp tries to score *) 
else 
(% move closer to ball +) 
else 
if blocked(p) then (+ danger of tackle *) 
pass(p) 
else 
dribble(p) 
else (+p does not have ball *) 
advance(p); (%* move upfield *) 
end; (attacks *) 


procedure defends (p : players); 
(* p makes defensive play *) 


begin 
if nearball(p) then 
tackles(p) 
else 
retreat(p); (cover own goal *) 
end; (defends *) 


begin (%* main line *) 
init; (# initial set-up *) 
for half := 1 to 2 do 
begin 
kickoff; 


237 


238 PASCAL AT WORK AND PLAY 


repeat 
(% display ball and players +) 
pick(p); (+ select player to act *) 


if (* p’s team has ball *) then 
attacks(p) 
else 
defends(p); 
if goal then 
kickoff; (update scoreboard & restart *) 
until (% end of half +); 


swapover; (%* change ends at half-time *) 
end; (of game *) 


end. 


This introduces the following general-purpose procedures. 


INIT prepare for the game 
KICKOFF _ restart from standard positions 
PICK choose next player to move 


SWAPOVER change ends 


The crucial one is PICK. This routine decides which player will move next. In the 
program as written it chooses one of four equally probable alternatives: 


(1) p remains as before, i.e. the last player to act carries on; 

(2) pis the player with the ball; 

(3) pis the nearest player to the ball (not always the same as 2); 
(4) pis chosen purely at random. 


This is a compromise between continuity and diversity. In practice it works out 
reasonably well: the player with the ball tends to make a coherent series of moves, 
but does not hog the play for very long; and there is some ‘running off the ball’. 
Underlying it all, as usual, is a random-number generator (see Chapter 10). 

(This shows that Pascal can be used for discrete event simulation. There are 
special languages for this purpose, e.g. ECSL based on Fortran and Simula based 
on Algol, but Pascal is quite adequate in the present instance.) 

The outline program also introduced three tests and five event routines: 


NEARBALL | true if p near to ball 
NEARGOAL | true if p near opposing goal 
BLOCKED _ true if p near an opponent 
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PASS p kicks ball to another member of own side 
DRIBBLE p moves upfield with the ball 
SHOT p attempts to score a goal 


ADVANCE _ p moves upfield without ball 
RETREAT p moves back to cover goal. 


To see how these work, study the program listing in the next section. Nearly 
all of them call RAND at some point. The test functions all call DISTANCE and 
the action procedures call MOVE to make things happen. In addition some of 
the action routines call one another, e.g. SHOT may call PASS and TACKLES 
may call DRIBBLE. 


15.3. The football program 


The constants Q1, Q2 and Q3 declared below are defined in terms of WIDE. This 
is permissible (and very useful) in Pascal/Z, though non-standard. If your version 
of Pascal does not allow it, you will have to declare Q1, Q2 and Q3 as global 
variables and initialize them with the same values as here. 


program football; 
(* 5-a-side football game for 380Z *) 
(* by Richard Forsyth, PNL, Aug 81 +) 


const 
halftime = 45; 
deep = 20; wide=32; (pitch dimensions *) 
q1 = wide dive 4; (% allowed in Pascal/Z :) 
q2 = wide dive 2; 
q3 =q1+q2; (*1,2,3 quarters *) 
blob = 42; (image of ball +) 
ball = 0; 
null=‘‘; (blank character *) 
homeside = ‘ARGENTINA’; 
awayside = ‘HOLLAND’; 
(* team names *) 


type 
byte =0.. 255; 
cardinal = 0. . maxint; 
players=0Q..10; (zero is ball *) 
(* 1..5 home side, 6 . . 10 away side +) 
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var 

half: 1..2; 
tick : cardinal; (clock ticks +) 
name : array [players] of char: 
down,east : array [players] of real: 

(% row and column coordinates *) 
emin,emax : array [players] of byte; 

(% players’ horizontal limits *) 
myplayer,p,pn,withball : players; 
goal : boolean; 
seed : cardinal; (%* random seed *) 
homegoal,awaygoal : cardinal; (scores +) 
hometeam,awayteam : set of players; 

(* members of each team *) 


procedure home; (home cursor, top left *) 


begin 
write(chr(29)); 
end; («of home *) 


procedure clrs; (clear screen *) 


begin 
write(chr(12)); 
end; (of clrs *) 


procedure plot (a,b : real; c : char); 
(%* puts character c at row a column b #) 
var down,east,j : cardinal; 


begin 
down := round(a); 
east := round(b); 
home; 
for j := 1 to down do 
write(chr(10)); (line feed +) 
for j := 1 to east do 
write(chr(24)); (right shift +) 
write(c); 
end; (#of plot *) 


procedure wait (t : integer); 
(* delay routine *) 
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begin 
t:=t* 10; 
while t > 0 do 
t:=t-—1; 
end; 


procedure init; 
(* sets up global values *) 
var, p,g : players; 


begin 
(% first name players *) 
name[ball] := chr(blob); 
for p :=1to5do 
name[p] := chr(ord(‘0’)+p); 
for p :=6 to 10 do 
name[p] := chr(ord(’a’)+j—6); 
(% now set limits on movement *) 
emin[ball] :=0; emax([ball] := widet+1; 
emin[1] := 1; emax[1] :=q1;  (# goalie *) 
emin[2] := 1; emax[2] :=q3; 
emin(3] :=1; emax[3] :=q3; (defenders *) 
emin[4] :=q1; emax[4] := wide; 
emin[5] :=q1; emax([5] := wide; (# attackers *) 
( now other goalie +) 
emin[6] :=q3; emax(6] := wide; 


g :=5; 
forp:=7to10do (other side *) 
begin 


emin[p] :=emin[g] ; 
emax([p] := emax[g] ; 
g:=g-—1; 
end; (* mirror image *) 
(* now set various globals *) 
seed := 0; withball := 0; 
homegoal := 0; awaygoal := 0; 
hometeam := [1..5]; 
awayteam := [6..10]; 
end: (of init *) 


procedure ballpark; 
(%* marks out the pitch *) 
var j : byte; 
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begin 
clrs: home; 
write(‘+’); 
for j :=1towidedo  write(‘—’); 
writeln(‘+’); (* top line +) 
for j := 1 to deep div 2 — 2 do 
writeln(‘!’ null:wide,’!’); 
forj:=1to4do_ writeln; (goal area *) 
for j := 1 to deep div 2 — 2 do 
writeln(‘!’,null:wide,’!’); 
write (‘+’); 
forj :=1towidedo write (‘—’); 
writeln(‘+’); (% bottom line *) 
wait(100); 
end; (of ballpark +) 


function distance (row1,col1,row2,col2 : real) : real: 
(* Euclidean distance *) 


begin 
distance := sqrt(sqr(row1—row2)+sqr(col1—col2)): 
end; (of distance *) 


function rand : real; 
(% random number generator +) 


const 
c= 824.0; k=0; m= 10657.0; 
vars : real; 
begin 
if seed=Othen (re-start sequence *) 
begin 


plot(deep+2.0,24.0,null); 
write(‘lucky no.?’); 
readin(seed); (% seed is global +) 
end; 
(% real arithmetic to avoid integer overflow *) 
s :=seed *c+k; 
s:=s—trunc(s/m) *m;  (%*s mod m #) 
seed := round(s); 
rand :=s/m; (between 0.0 and 1.0 *) 
end; (of rand *) 


procedure kickoff; 
(% re-starts from standard positions *) 
var p : players; 
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begin 
goal := false; 
ballpark; 


(% now put players in place *) 
east[ball] :=q2; (#* halfway *) 
east[1] :=1; east[6] := wide; 
east[2] :=q1; east[7] :=q3; 
east[3] :=q1; east(8] :=q3; 
east[4] :=q2—1; east[9] :=q2+1; 
east(5] :=q2—1; east{10] :=q2+1; 
down [ball] := deep / 2.0; 
down[1] := down|[ball] ; 
down[2] := deep / 4.0; 
down[3] := down[1] + down[2]; 
down[4] :=down[2] — 1.0; 
down[5] :=down[3] + 1.0; 
for p :=6 to 10 do 
down[p] := down[p—5] ; 
( now show them +) 
for p :=0 to 10 do 
plot(down[p] ,east[p] ,name[p] ); 

(% display latest score also +) 
plot(deep+2.0,7.0,null); 
write(homeside:12,homegoal:4); 
plot(deep+3.0,7.0,null); 
write(awayside: 12,awaygoal:4); 
wait (256); 

end; (of kickoff *) 


procedure swapover; 
(% changes ends at half time *) 
var p: players; swap: char; 


begin 
(%* simplest just to exchange shirts +) 
for p :=1to5do 
begin 
swap := name[p] ; 
name[p] := name[p+5] ; 
name[p+5] := swap; 
end; 
end; (of swapover *) 
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procedure move (p : players; rows,cols : real); 
(%* moves p to location rows/cols *) 
var h,v,dh,dv,diffmax : real; 
i : byte; 


begin 
v := rows + rand — rand; 
h := cols + rand — rand; 
(%* small random perturbations *) 
(% first do vertical displacement *) 
if p=ball then diffmax := deep 
else diffmax := deep /2.0; (ball moves faster *) 
dv := v—down([p]; 
if dv > diffmax then dv := diffmax 
else if dv < (—diffmax) then dv := —diffmax; 
v :=down([p] + dv; 
(* horizontal displacement next +) 
if p=ball then diffmax := wide / 2.0 
else diffmax := wide / 4.0; 
dh :=h — east[p]; 
if dh > diffmax then dh := diffmax 
else if dh < (—diffmax) then dh := —diffmax; 
h :=east[p] + dh; 
(% now enforce limits *) 
if v > deep then := deep 
elseifv<1then v:=1; 
if hh >emax[p] then h := emax([p] 
else if h<cemin[p] then h :=emin[p]; 
(* now do it in 8 steps +) 
dv := (v—down[p] ) / 8.0; 
dh := (h—east[p] ) / 8.0; 
fori := 1to8do 
begin 
plot(down([p] ,east[p] ,null); (+ old place *) 
down[p] := down([p] + dv; 
east[p] :=east[p] + dh; 
plot(down[p] ,east[p] ,name[p] ); (% new place +) 
end; 
end; (of move *) 


procedure pick (var p : players); 
(* selects next player to move *) 
varr: byte; n,nearest : players; 
d,dmin : real; 
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begin 
if p=O then p := trunc(rand*10) + 1; 
(* find nearest player to ball +) 
nearest := 1; 
dmin := distance(down[1] ,east[1] ,down[ball] ,east [ball] ); 
for n :=2 to 10 do 
begin 
d := distance(down[n] ,east[n] , 
down [ball] ,east [ball] ); 
if d<dmin then 
begin 
dmin :=d; nearest :=n; 
end; 
end; 
if (withball = O) or (rand > 0.8) then 
withball := nearest; 
(% now choose who will act *) 
r := trunc(rand+4); 
case r of 
O:p:=p; (*no change *) 
1: p := withball; (* player with ball +) 
2: p := trunc(rand+#10) + 1; (# pure chance *) 
3: p := nearest; (closest to ball x) 
end; (of selection *) 
end; (of pick *) 


function near (diff : real) : boolean; 
(% how close is close *) 


begin 
diff := diff *rand; (random perturbation +) 
if diff < (wide/8.0) then 
near := true 
else 
near := false; 
end; (of near *) 


function nearball (p : players) : boolean; 
(%* true if p near enough to ball *) 
var d: real; 


begin 
d := distance(down[p] ,east[p] ,down[ball] ,east [ball] ); 
nearball := near(d); 

end; (of nearball *) 
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function neargoal (p : players) : boolean; 
(%* true if p near enough to shoot +) 
vard: real; edge: byte; 


begin 
if pin hometeam then edge := wide 
else edge :=1; (% opposing goalmouth *) 
d := distance(down[p] ,east[p] ,deep/2.0,edge*1.0); 
neargoal := near(d); 
end; (of neargoal *) 


function blocked (p : players) : boolean; 
(* true if p threatened by tackle +) 
var j,n : players; d,dmin: real; 


begin 
if pin hometeam then n:=6 
else n:=1; (first in other side *) 
dmin := wide * deep; (large value +) 
for j :=n to nt4do 
begin 
d := distance(down[p] ,east[p] ,down[j] ,east[j] ); 
if d<dmin then 
dmin :=d; 
end; 
if near(dmin) then blocked := true 
else blocked := false; 
end; (of blocked +) 


(* event routines **) 


procedure pass (p : players); 
(% p passes ball to a team-mate *) 
varn: players; edge: byte 


begin 
if p in hometeam then 
begin 
n:=1; edge := wide; 
end 
else 
begin 
n:=6; edge :=1; 
end; 


n := trunc(rand*5) +n; (player to receive ball +) 
move(p,down [ball] ,east[ball]); (+ get close *) 
ifp=nthen (don’t pass to self *) 
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begin (just kick upfield *) 
withball := 0; 
move(ball,down[p] ,(edgeteast[p] )/2.0); 
end 
else (: proper pass +) 
begin 
move(ball,down[n] ,east[n] ); 
withball :=1n;  (% receiver has it *) 
end; 
end; (of pass %) 


procedure dribble (p : players); . 
( p moves upfield with ball *) 
var h,v : real; 


begin 
move(p,down [ball] ,east[ball]); (+ get really close *) 
if p in hometeam then —h := wide/4.0 
else h := —(wide/4.0); 
h :=h *rand + east[ball] ; 
v :=rand * 2.0 — 1.0 + down [ball] ; 

( random fluctuations +) 

movelball,v,h);  (* kick +) 
move(p,v,h); (and run +) 
withball :=p;  (* p has it +) 

end; (of dribble *) 


procedure shot (p : players); 
(% p tries to score a goal +) 
varg: players; edge: byte; 
vertical : real; 


begin 
move(p,down [ball] ,east [ball] ); 
if p in hometeam then 
begin 
g:=6; (opposing goalie *) 
edge := wide + 1; 
end 
else 
begin 
g:=1; edge :=0; 
end; 
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move(g,down[g] ,(east{[g] +edge)/2.0); 
(* keeper tries to cover goal *) 
vertical := (deep/2)+(rand#4—2); (* aim %) 
move (ball, vertical,edge*1.0); (%* shoot +) 
if (abs(edge—east [ball] ) > 1.5) or 
(distance(down([g] ,east[g] , 
down[ball] ,east[ball] ) < (rand*4)) then 
begin (saved +) 
withball := g; 
if rand <0.5 then _ pass(g) 
else move(ball,deep/2.0,1.0+wide—edge); 
(* pass or boot it upfield *) 
end 
else (% scored! +) 
begin 
withball :=0; goal := true; 
(* update the scores *) 
if (half=1) = (p in hometeam) then 
homegoal := homegoal + 1 
else 
awaygoal := awaygoal + 1; 
move(ball,deep/2.0,edge*1.0); (% ball flickers +) 
move(p,down([p] ,east[p]); (+ p jumps for joy! +) 
end; 
end; (of shot *) 


procedure advance (p : players); 
(* p moves forward without ball +) 
var d,vertical : real; 


begin 
:= wide / 4.0; 
if pin awayteam then d := —d; 
vertical := rand * deep + 1.0; 
move(p,vertical,east[p] +d); 
end; (of advance *) 


procedure retreat (p : players); 
(% p moves back to protect own goal +) 
var dh,vertical : real; 


begin 
if p in hometeam then 
dh := (1.0—east[p] ) * 0.25; 
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else 
dh := (wide—east[p]) * 0.25; 
vertical := (rand#deep+1.0 + down[p] ) «0.5; 
move (p,vertical,east[p] +dh); 
end; (of retreat *) 


procedure tackles (p : player); 
(% attempts to get ball from player with it *) 
var d1,d2 : real; 


begin 
move(p,down [ball] ,east [ball], 
d1 := distance(down[ball] ,east [ball] ,own[p] ,east[p] ); 
d2 := distance(down [ball] ,east[ball] , 
down [withball], east [withball]); 
ifd1<d2 then (success *) 


begin 

withball := p; 

if rand >0.5 then dribble(p); 
end 


else (x failure *) 
dribble(withball); 
(% Opponent moves away *) 
end; (#of tackles +*) 


procedure attacks (p : players); 
(% p makes attacking move *) 


begin 
if p=withball then (pin possession *) 
if neargoal(p) then 
if nearball(p) then 
shot(p) (try to score +) 
else 
move(p,down [ball] ,east [ball] ) 
else 
if blocked(p) or (p in [1,6] ) then 
pass(p) (goalies don’t dribble *) 
else 
dribble(p) 
else (+ p does not have ball :) 
advance(p); ( move upfield +) 
end; (of attacks *) 
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procedure defends (p : players); 
(* p makes defensive play *) 
begin 
if nearball(p) or (rand>0.75) then 
tackles(p) 
else 
retreat(p); (3 cover own goal *) 
end; (of defends) 


procedure instruct; 
(% give user instructions *) 


begin 
clrs; home; 
writeln(‘welcome to the 5-a-side football game.’); 
writeln(‘Sorry — no instructions yet!’); 
( should inform user what to do here +) 
wait(400); 

end; (of instruct *) 


begin (main line *) 
init; (+ pre-match warm-up *) 
instruct; (instructions for user +) 
for half := 1 to 2 do 


begin 
tick :=0; (referee starts watch *) 
p :=0; 
kickoff; (start playing *) 
repeat 


for pn :=0O to 10 do 
plot(down[pn] ,east[pn] ,name[pn] ); 
(* show the players +) 
pick(p); (+ pick player to act +) 
if (p in hometeam) = (withball in hometeam) then 
(+ own side has possession +) 
attacks(p) 
else 
defends(p); 
if goal then 
kickoff; (% update scoreboard & restart after goal *) 


tick := tick + 1; 

wait(2); 
until tick > halftime; (+ end of half +) 
swapover; (change ends *) 


end; (of game *) 
end. 
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No output is shown this time, since it is all ephemeral and ‘stills’ would not 
convey the action. You will have to run it if you really want to see what it looks 
like. 

The program does produce a series of happenings on the screen which bear some 
resemblance to football. The quality is not high and there are silly blunders 
(especially by goalkeepers) and an abundance of boring backpasses, but there are 
also some entertaining moments. If you are not careful you may even find yourself 
cheering an apparently well-planned inter-passing movement that leads to a goal. 
The ardent football fan will take several games to grow tired of it. 


15.4 Suggested improvements 


This program is really just a point of departure for your own projects. It can be 
improved in a number of ways. We shall discuss four possible enhancements. 


15.4.1 The RAND function 


The trouble with RAND is that there is no real-time clock on our 380Z, so to 
re-seed the random sequence we ask the user for a ‘ucky number’. This is rather 
artificial and irritating. It also means that once you have tried a particular starting 
value, you know how the game will turn out. (At one stage during testing the 
number 1977 could be relied upon to give Holland a 7—0 victory!) 

The function could be improved by restarting the sequence using a real-time 
clock if you have one or by looking at arbitrary unused memory locations if you 
can get at them. You might also wish to try a random series with a longer period 
(refer to Chapter 10). 


15.4.2. A better display 


For reasons of portability, as explained earlier, the program assumes only minimal 
cursor-control facilities. But your computer may have multi-coloured high-resolution 
graphics. To make use of such facilities only the PLOT procedure needs alteration, 
though you may also want to have the teams in two different colours — rather 

than calling the players 1,2,3,4,5 and A,B,C,D,E — and this would entail changing 
the contents of the NAME array. This is a case where you will have to consult 

your user manuals. There is no agreed standard on how to control a graphics display. 


15.4.3. Richer variety of events 


The event routines could do with some elaboration. In particular a player should 
not attempt to dribble upfield when he is at the edge of his zone of action. This 
implies that DRIBBLE should test how close the selected player is to his limit and, 
if he is on or very near it, make a pass instead. There is also a need to make the 
passing more intelligent. At present the ball is merely kicked towards any player 
of the same side, chosen at random. This sometimes means a back-pass from deep 
within the opponents’ half to the goalkeeper — which normally fails even to reach 
him. (Shades of the England international team during the 1970s!) Instead ot 
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doing this, the procedure ought to find the best-placed player of the same side. 
This player should not be too far away or the pass will not carry, and preferably 
should be further forward and not blocked by an opponent. A feasible method 
of selection would be to form the sum 


distance from opposing goal + distance from self 
— distance from nearest opponent 


for every other player on self’s side and pick the one with the lowest score 
to receive the pass. This is far from impracticable, and would improve the 
‘purposefulness’ of the play. 

Additionally, there may be a case for other events such as corners, free kicks, 
penalties and even own goals. It is an interesting exercise to attempt to translate 
from footballing concepts into Pascal procedures — though it is hard to see how 
‘fouls’ would be implemented! Obviously something is lost in the translation, but 
it forces you to be precise about the meaning of phrases like ‘moving into space’ 
or ‘opening up the defence’ which you might think you understand quite well 
until attempting to pin them down. 

Of course if you dislike football you can always try your hand at Baseball, 
Cricket, Lawn Tennis or some other sport. 


15.4.4 User interaction 


The program at present is purely a spectator sport. The crowd cannot invade the 
pitch, so to speak. 

The viewer would get more involved if he or she could control one of the players. 
Different microcomputers have different facilities for this purpose. Some have one 
or two joysticks or ‘game paddles’, others have a bare keyboard; so it is hard to give 
general advice. 

Nevertheless, one widely applicable technique is to assign four keys to indicate 
up, down, left and right. The program then monitors the keyboard status 
at regular intervals and, if one of these keys has been pressed, moves the chosen 
player (which the user must select during initialization) one step in the indicated 
direction (but not off the pitch). 

Unfortunately, in Pascal/Z at least, this requires writing at least one assembly 
language routine to be linked in with the Pascal program. I do not want to go into 
details of that here, but if you are prepared to do it, the problem can be solved in 
the following manner. 

Write a Pascal-callable assembly language function KEYBOARD which returns 
chr(0) if the user has not pressed a key since the last console input and otherwise 
returns (but does not echo) the latest character typed. It should not wait for a key 
to be typed. Then declare it as external in the program 


function keyboard : char; external; 
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and alter the statement 


if goal then 
kickoff; (% update scoreboard and restart *) 


near the end of the REPEAT loop in the main program as follows. 


if goal then 
kickoff; (update scoreboard and restart *) 
else 
begin (give user a chance to intervene *) 
ch := keyboard; 
(% read console without echo +) 
if (ch <> chr(0)) and (rand > 0.5) then 
(* chr(O) means that no key was struck +) 
begin 
case ch of 
‘q’ : down[myplayer] := down[myplayer]— 1; 
(* up *) 
‘a’ : down[myplayer] := down[myplayer] + 1; 
( down +) 
‘2’ : east[myplayer] := east[myplayer] — 1; 
(* left +) 
‘x’ : east[myplayer] := east[myplayer] + 1; 
(% right +) 
( ignore all other characters *) 
end; (of case *) 
move(myplayer,down[myplayer] ,east[myplayer] ); 
end; 


end; 


The INSTRUCT procedure will have to let the user choose MYPLAYER and tell 


him or her which keys (Q,A,Z,X) to use for movement. These are reasonably 
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ergonomic on a QWERTY keyboard, although you may find another group of four 


more comfortable. 


This will give the user some control over one player’s movements. It will still be 
up to the PICK procedure to make that player do things, though the nearer the ball 


the more likely that would be. 
Further refinements are left to your imagination. 


16 


Case study 4 
(Go-Moku) 


In 1769 Wolfgang von Kempelen unveiled a chess-playing automaton to the 
Vienna Imperial Court — the ‘Turk’. It was a life-sized figure in Ottoman robes 
which manipulated the pieces on a chest in front of it with its left hand. 
Kempelen intended it as a demonstration of engineering virtuosity; but it achieved 
international repute as an ingenious conjuring trick. After Kempelen’s death in 
1804 it came into the possession of a Bavarian musician called Johann Maelzel, 
who was more of a showman than an engineer. He arranged its most celebrated 
feat, when in 1809 it defeated Napoleon three times. After the machine’s third 
victory the emperor swept the pieces from the board and stomped out of the room. 
(Napoleon had little chance to exercise his chess faculties since his opponents 
always took good care to lose to him.) 

How the Turk played is still something of an enigma, but there was certainly 
someone — a child or a midget — concealed in the chest. Elsewhere in the hall an 
expert player observed the game and signalled moves to the hidden accomplice. 
The Turk was exhibited successfully in Europe and America until its destruction 
by fire in 1854 (Bell, 1978). 

Almost two centuries later, in August 1968, David Levy, then the Scottish chess 
champion with an ELO rating of 2250, issued a challenge to Professor Donald 
Michie of the Machine Intelligence Department at Edinburgh University. He bet 
that no computer program would beat him across the board at chess within ten 
years. Michie accepted, indeed he raised the stakes, and the wager was on. The 
final match of the challenge was arranged, with considerable fanfare, at the 1978 
[FIP autumn conference in Toronto. Levy, by now an international master (ELO 
rating 2340) and a specialist in computer gaming, duly won his bet by beating 
Chess 4.6 — the world computer chess champion. (Still, it would have beaten 
Napoleon.) 

Who were you rooting for — the man or the machine? If you were comforted 
by this tale of the human David vanquishing the inhuman Goliath you might be 
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interested to know that Levy appears to have gone over to the other side: he now 
runs a firm called Philidor Software which produces game-playing programs. 


16.1. The Pygmalion factor 


Why do grown men get up to such capers? What is the fascination of a machine 
that plays a game which, in humans at least, requires intelligence? 

It can be argued that the time spent trying to construct a machine to play 
top-class chess is not wasted. For example, the ‘Pioneer’ chess planning algorithm 
developed by Dr Mikhail Botvinnik, former world chess champion and Professor 
of Electrical Engineering at Moscow University, has been adapted to schedule the 
repair and servicing of power stations throughout the USSR. It has also been 
argued that computer chess gives a valuable insight into the way people think. 
Against this I would point out that the current (1981) world computer chess 
champion is a piece of dedicated special-purpose hardware — scarcely a computer 
at all. It is no use for anything except chess, and certainly not meant as a model 
of the human brain. 

The motive for creating a program/machine to play games of mental skill lies 
deeper: it is the lure of Pygmalion. The prospect of losing a contest of wits to a 
machine both frightens and fascinates us, as HAL in Stanley Kubrick’s film 2001: 
A Space Odyssey well illustrates. That is why the story of Frankenstein and myth 
of Pygmalion exert such a powerful hold on our imaginations. 

I cannot justify this quasi-parental urge (except that it may be more urgent 
than we think for the human race to design its successor) but I can testify to its 
existence: when I finally managed to beat my own Go-Moku program, I was 
genuinely disappointed. 


16.2 Minimax look-ahead 


Those of you who are still reading are obviously converts (addicts?) so let us get 
down to business, and pleasure. 

Most programs for the type of game we are discussing have two main aspects — 
evaluation and look-ahead. Evaluation tells the system how good or bad a state of 
the game is, and look-ahead allows it to estimate the consequences of selecting 
different moves. The two processes are intertwined since the evaluation of the final 
stage in the look-ahead determines which move is chosen. The game of Noughts 
and Crosses (Tic-Tac-Toe) provides a simple example and allows us to introduce 
some terminology. 

Figure 16.1 is an example of a look-ahead ‘tree’. The nodes are board-states and 
the branches connecting them are legal moves. At the top is the current state, 
reached after 6 moves have been played already. The current state is said to be at 
ply 0, all states reached in one move are at ply 1, those reachable in two moves are 
at ply 2, and so on. At even ply White (0) is to move, at odd ply it is the turn of 
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Figure 16.1 Look-ahead tree 


Black (X). Board states which are won, lost or drawn are known as terminal nodes. 
In Fig. 16.1 White is to play. There are three choices, leading to 3 nodes at ply 1, 6 at 
ply 2 and 4 at ply 3. This gives 14 nodes in all, of which 6 are terminal (two wins, 
two losses and two draws). 

In this instance terminal nodes are evaluated as follows: —1 for a loss, O for a 
draw, +1 for a win. To evaluate the nodes at ply 1, and hence to decide which of 
the three moves is best, the terminal values are ‘backed up’ by a process known as 
minimaxing. This assumes that at even ply white picks the best move while at odd 
ply Black picks the worst (from White’s viewpoint). By using this rule it is found 
that the value at ply 0 is 0, and that the correct choice for white is the leftmost 
move, into position 2. Any other move will lead to a loss, unless Black makes a 
mistake. 

Of course this is a trivial example. In games of interesting complexity there are 
far more possibilities at each ply. In chess there may be 30 alternative moves at 
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each ply, in Go-Moku over 200. To evaluate 30 x 30 x 30 x 30 (810 000) nodes for 
a four-ply search is bad enough, to evaluate 200 x 200 x 200 x 200 would be out of 
the question. 

Normal practice is to modify the strict minimax rule in two ways: (1) to 
evaluate non-terminal nodes; (2) to ignore many ‘uninteresting’ nodes. 

The first relaxation leads to imprecision. At a terminal node the value of the 
game is known. At a non-terminal node it can only be approximated. In chess or 
draughts this approximation might be computed by taking account of piece 
advantage, control of the centre, relative mobility of pieces and so on. The 
programmer selects features of the game that can be measured and assigns each 
feature a worth. Losing a queen in chess, for instance, is bad news and would 
contribute to a low evaluation score. 

The second relaxation is achieved by ‘pruning’ the tree. It means that the best 
move may be overlooked. There is a procedure, the Alpha-Beta algorithm, that is 
guaranteed to give the same result as full minimaxing while examining only a 
fraction of the nodes, but, as we shall see (Section 16.4), it is rarely sufficient 
on its own and some more drastic form of pruning is usually necessary. A favoured 
method is to rank the moves at each ply using the evaluation function: at ply 0 
only the best 50%, say, are examined, then at ply 1 only the top 25% are considered, 
at ply 2 only the top 12% and so on. Even this may not be enough to stave off the 
so-called ‘combinatorial explosion’ if the branching factor (number of options at 
each ply) is high, as in games like Go and Go-Moku. 

It is usual to look ahead not a fixed number of plys but until a quiescent or 
‘dead’ position is reached. This means in chess, for example, that after a capture 
move any sequence of re-captures would be followed through to its end. 


16.3. The Go-Moku program 


Let us focus our attention on a particular game, and return to these general ideas, 
hopefully with a better perspective, in a later section (16.4). 

Go-Moku is a simple game, said to originate in Japan. It is like Noughts and 
Crosses but much more interesting. The players take turns to place pieces in free 
squares on the board. The first one to obtain an unbroken line of 5 pieces — 
horizontal, vertical or diagonal — is the winner. 

It is normally played on a 19x19 grid, but to fit on the screen (and because 
the program is slow enough as it is!) our version uses a 15x15 grid. 

Go-Moku is slightly less complex than draughts (therefore far less complex 
than Chess or Go). It is ideal for us in that it is challenging enough for people to 
find it enjoyable while still being eminently suitable for computerization. 


16.3.1 The evaluation function 


This program uses an evaluation function and no look-ahead at all. In most game 
programs there is a trade-off between evaluation and look-ahead. You can either 
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have a sophisticated evaluation function and a shallow look-ahead or a coarse 
evaluation function which is easier to compute and permits a deep look-ahead, 
but not both. Our look-ahead is as shallow as it can be for two reasons: (1) the 
branching factor in Go-Moku is enormous, with hundreds of choices at each node; 
(2) it is possible to devise a very accurate evaluation function. 

The present evaluation function takes as its starting point the idea of an 
incomplete line. A run of five pieces represents a win. Therefore four in a line, 
unless blocked at both ends, leads to an immediate win. The essence of the 
evaluation is that a 4 is better than a 3 which is better than a 2 which is better 
than a 1. This is depicted below. (Dots mark vacant positions: the asterisk is the 
proposed move, which must also be a vacant position of course.) 


O0O00.« 
. OOO0O« 
. O0O« 
. Ox 


The next step is to include opposing pieces. A good strategy is 


. make a 4 of your own into a 5 
. block an enemy 4 

. make a 3 intoa4 

. block an enemy 3 

. make a 2 into a3 

. block an enemy 2 

. make a 1 intoa2 

. block an enemy 1 


On NN PWN — 


on the understanding that if (1) cannot be accomplished then (2) is tried, and so on 
down the list. This leads to the extended ranking below. 


O000« 
XX XX « 
. OOO:« 
_KXKXK* 
. OO« 

_ XXKX»* 

. Ox 
_X* 


where the computer is O and its opponent X. 

Problems start to arise when a sequence is broken or blocked at one end. (If it is 
blocked at both ends it is impossible to make five, and so worth nothing.) For 
example, all the following are threesomes, but they are not all equally valuable. 


OOO. « 
O00. Ox 
0. OO0O« 
.- O0O0O« 
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It should be clear that the bottom row is the most valuable, since it makes a 4 
directly. The top three can all be spoilt by an enemy reply in the central empty 
position, but the last one is unbeatable, unless the square to the right of the 
asterisk is blocked. 

The possibility of blockage by enemy stones or the edge of the board must also 
be faced. The function must decide between the two patterns below, for example. 


XOOx«.. 
.- O08... 


Again the lower one is preferable because it may be extended in both directions, 
even though either one leads to a line of 3. 

A further complication arises when we consider two or more lines that converge 
on a point. Consider the two situations below. 


- 0008. .. OO«. 
O 
O 


Are two interlocking two’s as good as an isolated three? In the above case the three 
is better because it leads to a win in one move rather than 2; but if we had 


. OOO« X .. OO. 
O 
O 


the pair of two’s would be more promising. In fact it would win. So a single 
blockage is enough to tip the balance. 

To deal with these points four features are used in evaluating a potential line of 
five — i.e. a sequence of four squares adjoining the vacant square under consideration 
which is not interrupted by an edge or opposing piece. 


1. the number of blanks in the line (0 . . 4) 

2. whether the pieces form a contiguous run or have a ‘hole’ 

3. whether the nearest piece is adjacent to the square under consideration 
4. whether the line is open at both ends or not 


If you look at the function FOURSOME (called by function EVALUATE) you 

will see how these are calculated for the configuration around any given square. 

The program also makes allowance for which side the pieces belong to. The resultant 
value of a single potential line of five is then given by the expression 


sqr(4-gaps) + together + next + open — 2hostile 


where gaps is the number of empty squares (so 4-gaps is the number of pieces). 
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The other variables are 


together 1 if the pieces are connected 


0 if they have a hole 

next 1 if the nearest piece is adjacent 
0 otherwise 

open 1 if both ends are unblocked 


0 otherwise 
hostile 1 if the pieces belong to the enemy 
O for self’s pieces. 


The dominant term is sqr(4-gaps) which will give 16 for a foursome, 9 for a 
threesome, 4 for a pair, 1 for a singleton and zero for an empty line. The other 
features are by way of minor adjustments, though the best pair will be better than 
the worst threesome. 

To give you an idea of how it works, we will compute the value of the line 
below 


.-O0.O0+%0X 
123 4 


where the 4 significant squares are numbered. There is only one gap, so this is a 
threesome (9 points). It gets a bonus for adjacency, but nothing for togetherness 
as it has a hole and nothing for openness as it is blocked on the right. It is made 
up of self’s pieces so hostile=O. The net score is 10. Now try this one for yourself. 


. XX«x. O 
123 4 


You should get the result of 4. 

This formula was arrived at by trial and error. It gives a strong game, but it is 
known not to be perfect. When you are sure you understand it, you can have some 
fun trying to improve it (see Section 16.4). 

To combine the values of lines in four directions converging on a single point, 

a simple weighting scheme was used. The four values were sorted into descending 
order in v (array [1 . . 4] of integer) and the combined score computed as 


v[1] #64 + v[2] #16 + v[3] #4 +v[4] . 


Thus the best line contributes 4 times as much as the second best, and so on down 
to the worst. These weights were chosen with care; but one way to see whether 
they can be improved would be to play programs using different weightings against 
one another until a champion emerged. 

With this evaluation function every empty square on the board can be assigned 
a numeric value, dependent on the pattern of pieces surrounding it. The program 
then selects the largest and moves to that position. This is equivalent to 1-ply 
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look-ahead, though it seems misleading to regard it as look-ahead at all. 

Note that it is customary to evaluate board states (as we did with the 
Noughts and Crosses example) whereas this function evaluates individual moves. 
The value of the board, if required, can be taken as the value of the best move 
on it. 


16.3.2 Program structure 
An overall view of the program can be gained from the following outline. 


begin 
(* initialize everything *) 
(% inform user how to play *) 


repeat 
(% clear the board *) 
(% find out who starts *) 
repeat 
if (* human first or subsequent move +) then 
begin 
(% get person’s move *) 
(* test for a win *) 
end; 
if (* game not over *) then 
begin 
(* make computer’s move *) 
(% test for a win *) 
end; 
( display updated board state +) 
until (+ win, lose or draw *); 


(% congratulate the winner *) 
until (+ enough games +); 


end. 


Notice that the board is displayed only after each machine move: this is quite 
sufficient. 

The subprograms for showing the board, getting the human player’s move, 
testing for a win etc. are comparatively straightforward. The most interesting 
procedure is MAKEMOVE which chooses the program’s next play. As explained 
this is done by calling EVALUATE (which uses FOURSOME) to evaluate every 
empty square and picking the one with the highest value. 


bd 
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16.3.3. Program listing 


Here is the complete Go-Moku program in Pascal/Z. 


program gomokutl; 
(% Go-Moku program by Richard Forsyth, PNL, Sep-81 +) 


const 
edge =0; us=1; them=2; none=3; (4 board states *) 
gridsize = 15; (+ playing area is gridsize-gridsize +) 
maxmoves = 200; (longest permitted game +) 
alphabet = 96; (+ ASCII offset for numeric conversion *) 
null = ° '; 


type 
squares = edge..none; (*4 kinds of square *) 
smallint = 0. . girdsize; 
byte =0.. 255; 
cardinal = 0. . 65535; 
line = array [0 .. 9] of squares; 


var 

grid : array [1.. gridsize,1 . . gridsize] of squares; 
(+ the game board is global to all subprograms *) 

name : array [squares] of char; 
( printable form of board pieces *) 

icol,irow : array [1..4] of —1.. 1; 
( directional increments *) 

play : packed array [1 ..maxmoves] of record 
rowfield,colfield : smallint; 
end; («for recording the game *) 

v,vals : array [1 ..4] of integer; 

i,j,r,c : byte; (% row/col coordinates ¥) 

onboard : set of smallint; 

move : cardinal; 

endgame : squares; 

yourturn : boolean; (% who goes first *) 

topvalue : integer; 

response : char; 


procedure tell; 
(% gives user instructions *) 
var y : char; 
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begin 

writeln(‘Welcome to Go-Moku!’); 

writeln; 

write(‘do you want instructions (n=no) ? '); 

readin(y); 

if y<> ‘n’ then 

begin 
writeln(‘The object is to get a line’); 
writeln(‘of five consecutive pieces —’); 
writeln(‘horizontally, vertically or diagonally.’); 
writeln(‘You have ‘, name[them] ,’ the computer has ’, name[us] , ‘. ’); 
writeln(‘We take turns to occupy vacant squares.’); 
writeln(‘You indicate your next move as’); 
writeln(‘column-letter followed by row-number.’); 
writeln(‘Columns are ‘,chr(1+alphabet),’ to ’, 
chr(alphabet+gridsize),’.’); 
writeln(‘The rows are 1 to ‘,gridsize:2,’.’); 
writeln; writeln(‘Good Luck!’); 
end; 
end; (of tell *) 


procedure init; 
(%* preliminary set-up *) 
var m: real: 


hegin 
(* first name the kinds of square *) 
name[none] := ‘.’: 
name[us] :=‘o’; (% program has noughts +) 
name[them] := ‘x’; (% human has crosses +) 
name[edge] := ‘—’; 


(% now set up directional increments *) 


irow[1] :=0; icol[1] :=—1; (* west x) 
irow[2] :=—1; icol[2] :=—1;  (* northwest =) 
irow[3] :=—1; icol[3] :=0; (north #) 
irow[4] :=—1; icol[4] :=1; (northeast *) 


(* only 4 directions needed, other 4 are opposite +) 


onboard := [1.. gridsize] ; 
end; (of init *) 


procedure whofirst (var youfirst : boolean): 
(% decides who will have first move +) 
var no : char: 
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begin 
writeln; 
write(‘do you want to move first (n=no) ? ’); 
readin(no); 
youfirst := no <> ‘n’; 
end; (of whofirst *) 


procedure slab (r,c,compass :smallint; var | : line); 
(% line of 10 centred on r,c in direction compass *) 
var i,j ‘ integer; k : smallint; 


begin 
i:=r; ji=c; 
fork :=4downtoQdo (go west, or equivalent +) 
begin 
i := i+ irow[compass] ; 
j = j+ icol [compass] ; 
if (i in onboard) and (j in onboard) then 
I[k] :=grid[i,j]  (*ok *) 
else 
I[k] := edge; (# out of bounds +) 
end; 
j:=r; ji=c 
fork :=5to9do (go east, or equivalent +) 
begin 
i := i — irow[compass] ; 
j := j — icol[compass] : 
if (i in onboard) and (j in onboard) then 
I[k] := grid [i,j] 
else 
I{k] := edge; 
end: 
end; (of slab *) 


procedure remember (i,j : smallint); 
(% stores move for later processing *) 


begin (:* move is global +) 
play[move] .rowfield := i; 
play [move] .colfield := j; 

end; (of remember *) 
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procedure dumpgame (m : cardinal); 
( displays moves of game afterwards *) 
var n: cardinal; 


begin 
for n := 1 to mdo 
with play[n] do 
begin 
write(chr(colfield+alphabet) ,rowfield:2); 
if odd(n) then’ write (‘ ‘) 
else writeln; 
end; 
end; (of dumpgame *) 


function foursome (var span : line; self : squares) : integer; 
( basic Go-Moku scoring routine *) 
var best : integer; near : boolean; 
i,s,firstone,last,gaps : cardinal; 
friendly : set of squares; 
begin 
best :=0; friendly := [none;self] ; 
fori:=1to5do (#5 possible foursomes *) 
begin 
firstone :=0; last :=0; (ends of sequence *) 
gaps := 0; near := false; 
S:=i;  (% starting point *) 
while (gaps<4) and (s < i+4) do 
begin 
if span[s] = none then 
gaps := gaps+1 (another blank +) 
else if span[s] = self then 
begin (+ own piece *) 
last :=s; 
if firstone=0 then _ firstone := s; 
near := near or (s in [4,5] ); 
(% adjacent +) 


end 

else (* blockage, stop at once *) 
gaps := 4; 

s:i=st1; 


end: 
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(* now work out the score +) 

s := sar(4-gaps): (*0..16 *) 

if (last-firstone) < (4-gaps) then 
s:=s+1; (add 1 for togetherness *) 

ifnear then s:=st1; 
(* bonus for adjacency *) 

if [span[i-1] ,span[i+4] ] <= friendly then 
s:=s+1; (both ends open *) 
(% bonus if unblocked +) 

if s > best then best :=s; (* new optimum +#) 

end; 
foursome := best; 
end: («of foursome *) 


function evaluate (r,c : smallint) : integer; 
(% evaluates position r,c *) 
var noughts,crosses,x : integer; i,j,thisway : smallint; 
span: line; (*v[] is global *) 


begin 
for thisway :=1to4do (4 angles +) 
begin 
slab(r,c,thisway span); 
noughts := foursome(span,us) + 2; (+ ours are better +) 
crosses := foursome(span,them); 
v[thisway] := max(noughts,crosses) — 2; 
(* range 0 to 19 *) 
end; 
(%* now put in order of merit *) 
for i := 1to 3 do 
for j:= 1 to 4—ido 
if v[j] <v{[j+1] then 
begin (%#swap them *) 
x :=vi{j]; 
vij] :=vlj+1]; 
v[j+1] := x; 
end; (% best first *) 
(+ final value is weighted sum *) 
evaluate := 
v[1] #64 + v[2] #16 + v[3] #4 + v[4]; 
end; (of evaluate *) 


procedure makemove (var r,c : smallint); 
(+ makes move for machine +) 
var bestcol,bestrow : smallint; e : integer; 
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begin 
topvalue :=0; (global for print&test +) 
bestrow :=0; bestcol := 0; 
if move=1then (start in middle *) 
begin 
bestrow := gridsize div 2 + 1; 
bestcol := bestrow; 
end 
else (%* subsequent moves *) 
for r := 1 to gridsize do 
for c := 1 to gridsize do 
if grid[r,c] = none then 
begin (only look at empty squares +) 
e := evaluate(r,c); 
if (e>topvalue) or (bestrow=0) then 
begin 
topvalue := e; 
bestrow :=Fr; 


bestcol :=c; 
vals :=v; (store values to show later *) 
end; 


end; (of board search #) 
r:=bestrow; c := bestcol; 
end: (of makemove +#) 


procedure getmove (var i,j : smallint); 
(%* gets player’s move *) 
varc: char; ok : boolean; 
cols : integer; 


begin 
writeln; 
repeat 
write(‘where do you move ? ’); 
read(c); readIn(i); 
(* column letter, row number +) 
cols := ord(c) — alphabet; (convert to numeric *) 
ok := (i in onboard) and (cols in onboard); 
if not ok then 
writeln(‘no such position as ‘,c,i:2) 
else 
if grid[i,cols] <> none then 
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begin 
ok := false; 
writeln(‘square ‘,c,i:2,’ already occupied!’); 
end; 
until ok; 
j := cols; 


end; (of getmove *) 


function test (r,c : smallint) : squares; 
(% tests whether r,c is a winning move *) 
var |: line; stop : boolean; 
d,k,p : smallint; mine : squares; 


begin 

mine := grid[r,c]; 

d:=1; (first compass direction *) 

repeat 
slab(r,c,d,l); | (#* extract a line, direction d *) 
k :=1; 
stop := false; 
forp :=5to8do 

if not stop and (I[p] =mine) then 


k:=kt+1 
else 
stop := true; 


stop := false; 
for p := 4 downto 1 do 
if not stop and (I[p] =mine) then 


k:=k+1 
else 
stop := true; 


d:=succ(d); (#* next way *) 
until (d > 4) or (k > 4); 
ifk<5then (*no win #) 
test := none 
else 
test := mine; (#5 in arow, last player wins +) 
end: (of test *) 
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procedure show (i,j,r,c : smallint); 
(« displays the board *) 
var n,rows : smallint; 
begin 
write(chr(12),chr(29)); (* clear screen & home cursor *) 
writeln(’State of play after’, move:4,’ moves:’); 
write(null:4); 
for n := 1 to gridsize do 
write(chr(alphabet+n):2; 
writeln; 
for rows := 1 to gridsize do 
begin 
write(rows:2, null:2); 
for n := 1 to gridsize do 
write(null,name[grid[rows, n] ] ); 
writeln; 
end; 
if move > 1 then 
writeln(’your last move was: ’,chr(j+alphabet),i:2); 
(% move < 2 means person hasn’t moved yet *) 
writeln(’my latest move was: ’,chr(ct+alphabet) ,r:2); 
(* column letter, row number +) 
forn:=1to4do_ write(vals[n] 1:8); 
writeln(topvalue:8); (: show the score too *) 
end; (% of show +) 


procedure message (wins : squares); 
( congratulations at end of game +) 


begin 
if wins = us then 
writeln(’i win once again!!!’) 
else if wins = them then 
writeln(‘Welf done, you win!’) 
else if wins = none then 
writeln(’Amazing — a draw!!’); 
( should be no other case *) 
end; (+ of message *) 
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begin (# main line *) 
init; (+ initialize various global values +) 
tell; (+ inform user how to play *) 
repeat 
for r := 1 to gridsize do 
for c := 1 to gridsize do 
grid[r,c] := none; 
(% clear board for play +) 
whofirst(yourturn); (find out who starts +) 
move :=Q0; endgame := none; 


repeat (main play cycle x) 
if yourturn then 
begin (%*human’s turn #) 
move := move + 1; 
getmove(i,j); (* person’s move *) 
grid [i,j] := them; 
remember(i,j); (# record the move +) 
endgame := test(i,j); | (%* game over ? +) 
end 
else 
yourturn := true; (always true after start *) 
if endgame = none then (not over yet *) 
begin (computer's turn *) 
move := move + 1; 
makemove(r,c); (%# machine move *) 
grid[r,c] := us; 


remember(r,c); 
endgame := test(r,c): 
end; 


show(i,j,!,C); 
(* display board after machine’s move *) 
until (endgame< >none) or (move>maxmoves); 


message(endgame); 

(%* congratulate the winner +) 
write(‘want to see moves ? ’); 
read|In(response); 
if response = ‘y’ then dumpgame(move); 
write(‘another game (n=no) ? ’); 
readin(response); 

until response = ‘n’; 


end. 
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16.3.4 Sample run 


Below is a sample game in which the program comfortably beat its author. The 
program shows the board state on the screen after every other move, but to save 
space the complete display is only printed here after the 7th, 17th and 21st (final) 
move. At the 17th a win for the program became inevitable. 

(Well you hardly expected me to show it losing, did you? If you want to beat it 
you will have to find out the hard way!) 


MACHINE STARTS AND TAKES CENTRE SQUARE. 


H8 F8 
G7 F6 
| 9 F10 
J10 


STATE OF PLAY AFTER- 7 MOVES: 


ABCDEFGHIJKLMNO 


YOUR LAST MOVE WAS: F10 
MY LATEST MOVE WAS: J10 
11.1 1 #1 = «725 


K11 
F7 Fi1 
FQ G11 
H7 E 7 
17 J7 
H9 
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STATE OF PLAY AFTER- 17 MOVES: 
ABCDEFGHIJKLMNO 


YOUR LAST MOVE WAS: J7 
MY LATEST MOVE WAS: H 9 
7 6 3 3 559 


H10 
G9 EQ 
J 9 


STATE OF PLAY AFTER 21 MOVES: 
ABCDEFGHIJKLMNO 


100... 2. XX. OL. ee 


YOUR LAST MOVE WAS: E 9 
MY LATEST MOVE WAS: J9 


18 4 3 1 1229 
| WIN ONCE AGAINI!!! 


The numbers under the program’s latest move show how it evaluated that move. 
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16.4 The next step 


Now it is your turn. This program plays a competent game, but it is not unbeatable. 
So there is still scope to express your creativity. Once you have figured out how to 
beat it, the next challenge is to produce a better one. Here are some hints. 


16.4.1 Usability 


The program could easily be made a little friendlier. Firstly, since it records each 
move played, it could allow the user to un-play moves. For example X5 typed 
instead of a proper move could be a code meaning: go back and remove the last 
5 moves from the grid and then re-display the board. This modification would allow 
the machine’s opponent to cheat as well as to correct mistakes, but humans have 
consciences while machines do not. 

Secondly the machine could let its opponent know what move it would make 
if it were to play. A sequence like 


flipgrid; (% invert board *) 
(%* choose own move but do not make it *) 
flipgrid; (restore board *) 


would do the trick (where the procedure FLIPGRID changes every 0 on the board 
to an X and vice versa). Then the computer could see the game from the other’s 
standpoint and produce a recommended move along with its evaluation score. With 
a little extra work it could also be made to give its opinion on the move the 
opponent actually does make — though this might be rather disconcerting! (These 
features ought to be suppressed for tournament play.) 

Little touches like these win the user’s loyalty. 


16.4.2 Efficiency 


The evaluation routine was designed for comfort, not for speed. You may be able 
to find a way of extracting the same information with less work. For instance, is it 
really necessary to call FOURSOME twice on each line — once for noughts and 
once for crosses? After all if the score is over 9 or under 2 for noughts there is no 
point in looking at it from the other side’s viewpoint. 

In the early part of the game the program wastes a lot of time looking in 
completely blank areas around the periphery of the board. If it kept a record of 
the highest and lowest row and column used so far it could ignore most of these. 

Other amendments will no doubt occur to you as you try to improve the 
evaluation function. 


16.4.3 Adding alpha-beta search 


A modification which would bring this program back into the mainstream tradition 
of computer gaming would be to look ahead further than one ply — using alpha-beta 
minimaxing. 
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The alpha-beta principle augments standard minimaxing with the following pair 
of rules: 


(1) if the value of a minimizing position (opponent to move) is found to be less 
than or equal to the alpha value of its parent, do not consider its remaining 
offspring; 

(2) if the value of a maximizing position (self to move) is found to be greater 
than or equal to the beta value of its parent, do not examine its remaining 
offspring. 


Alpha (beta) is a parameter which is updated every time a new maximum (minimum) 
is found. Suitable starting values for alpha and beta are —infinity and infinity 
respectively. 

My own feeling is that Go-Moku is very much a pattern-based game, and therefore 
a strong evaluation function is a first priority; but unless the evaluation function is 
absolutely perfect, look-ahead will improve the program’s performance. So here is 
one way to incorporate a look-ahead with alpha-beta cutoff. 

First we declare the types 


pptr = “position; 
position = record 
r,c : smallint; 
sons,brothers : pptr; 
end; 


and we write a tree-creation procedure that will build a linked structure of proposed 
moves rather as in Fig. 16.2. 

The brothers of a particular move are the alternatives to it at the same ply, 
preferably ranked in descending order according to the evaluation function. The 
sons of each move are the possible replies to that move at the next lower ply, ranked 
in descending order from the opposing viewpoint (i.e. ascending order). 


Ply N 
Next best move 
at ply N from r1, cl 
Best descendant move from r1, cl 
PlyN+1 


Next best move from r2, c2 


PlyN+2 Best descendant move from r2, c2 


Figure 16.2 Partial game tree 
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This routine must become stricter as the ply number increases. For instance at 
ply 1 all moves making a pair (or better) might be considered, at ply 2 only moves 
making triple (or better) would be worth generating, and so on downwards. Since 
five is a win this would limit depth to 5 ply. While generating the tree the procedure 
must assume for each move that its parent and all ancestors are on the board but 
none of its brothers or sons (or cousins, uncles, nephews, etc.). If the procedure 
calls itself recursively to generate the sons,.then by placing a piece on the board 
just prior to the recursive call and removing it immediately afterwards this will 
happen automatically. 

Now we come to the tricky bit. (You thought that was bad enough already? 
Sorry.) 


function minimize (p : pptr; deep,a,b : integer) : integer; 
forward; (* wait for it *) 


function maximize (p : pptr; deep,a,b : integer) : integer; 
var eval : integer; 


begin 


if (p“.sons = nil) or (deep >= maxdepth) then 
eval := goodness(p) (static evaluation +) 


else 
begin 
repeat 
grid([p%.r,p%.c] := us; (* play move p +) 
eval := minimize(p*% .sons,deep+1,a,b); 
a := max(a,eval); (+ new alpha value *) 
grid[p*.r,p%.c}] :=none; (+ remove p from board +) 
p := p“.brothers; 
until (p=nil) or (a >= b); 
eval := a; 
end; 


maximize := eval; 
if (eval>maxsofar) and (deep=1) then 
keep(p,eval); (+ preserve best so far at top level +) 
end; (# maximize *) 


function minimize; 
var eval : integer; 
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begin 
if (p“.sons = nil) or (deep >= maxdepth) then 
eval := goodness(p) (static evaluation *) 


else 
begin 
repeat 
grid[p“.r,p4.c] := them; 
eval := maximize(p4.sons,deep+1,a,b); 
b := min(b,eval); (% update beta +) 
p := p4.brothers; 
until (p=nil) or (b <= a); 
eval := b; 
end; 


minimize := eval; 
end; (of minimize *) 


The function GOODNESS is for evaluating a potential move. It can be very like the 
evaluation function already discussed at length except that it should always work 
from white’s point of view and therefore give equal and opposite values to runs of 
black pieces. 

KEEP is just to ensure that the best move at ply 1 is retained: there is no point 
in working out the value of the current board state and then forgetting which 
move led to it! 

A FORWARD declaration is used to split MINIMIZE into two parts. This is 
needed because the two procedures are mutually recursive: each calls the other. 
(Refer back to Chapter 7.) 

The whole process is set in motion at the top level by something like 


maxsofar := —topscore; 
boardval := maximize(p,1,—topscore,topscore); 


where topscore is the highest possible evaluation, used to initialize the alpha and 
beta values which control the search. P points to the root of the tree constructed 
by the generation routine (as per Fig. 16.2). If you think hard enough you may 
also be able to work out how to combine generation of the look-ahead tree with 
the minimaxing process. 

For more discussion of the rationale behind the Alpha-Beta algorithm see the 
books by Bell or Winston (Bell, 1972; Winston, 1977). 


16.4.4 Learning 


It would be really splendid to have a program that started off knowing nothing 
about Go-Moku strategy, played a number of games and ended up beating all 
comers. We are not likely to get that, but we could shift some of the burden of 
improving its play onto the computer itself. 
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Quite a lot can be done with simple parameter adjustment. In Section 16.2.1 it 
was pointed out that the relative weights for each of four lines converging on a 
point could be tested experimentally. It is easy enough to arrange a tournament in 
which versions of the program using different weightings all play each other (once 
having first move, once starting second) and receive +1 for a win, 0 for a draw and 
—] for a loss. Then the points can be totalled up: the one with the highest total 
presumably has the best parameters. Tabulation of the results might reveal a trend 
so that missing combinations could be estimated reliably. 

Of course with judicious scaling of the single-line evaluations (such that a single 
foursome swamps any number of threesomes but several good twosomes are 
equivalent to a poor threesome) it might be possible to avoid the ordering and 
weighting altogether by simply adding up the four separate scores. But similar 
remarks on parameter-adjustment apply to the scaling of elementary features 
within the evaluation function itself. 

Another simple kind of learning is rote memorization. Since this program uses 
only four features (gaps, together, next, open) to categorize each line it could 
store all conceivable combinations (and some inconceivable ones) ina5x2x2x2 
table of 40 elements — or two 40-element arrays, one for black and one for white. 
Each cell in the table could be used to record how often the pattern had been met 
on the way to a loss (L) and how often on the way to a win or draw (W). A rough 
estimate of its value would then be W / (W +L). 

The author has tried getting two subprograms based on such a scheme to play 
each other repeatedly, but progress was agonizingly slow. The main problem is 
identifying which of the loser’s moves were bad. Only one blunder may lead to a 
loss: the other moves might be perfect, but their score will still drop. Likewise, 
when one side plays badly and yet wins, as happens very frequently in practice, all 
its moves are treated as good ones. 

A simple scheme of this nature is unlikely to be adequate. It would be better for 
the losing program to play the winner’s game backwards asking at each move: would 
1 have made that move? At the point where the answer is no, some adjustment is 
needed so that the value of the pattern leading to the move the winner played is 
made greater than that of the one the program would have selected (but without 
upsetting all the other relativities!). 

Certainly the fact that the computer can play itself all day and all night if 
necessary and re-trace accurately all the steps leading to victory or defeat allows 
it to amass great volumes of experience. The problem is making intelligent use of 
that experience. The question of assigning responsibility is crucial. When we learn 
from our mistaxes in this sort of situation, we typically look back over the course 
of the game and say “aha! that’s where I went wrong, I should have gone there 
instead”. How can a merewmachine do this? Perhaps we should not expect it to do 
so. After all we do not expect children to learn entirely without guidance from 
a teacher. 


This suggests a cooperative enterprise in which a human expert re-plays the 
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game with the program after it has lost and, where the expert judges that it made 
a poor move, points out the fact and suggests a better one. To profit from this 
kind of tuition the machine requires a way of describing the features of a move 
and a method of induction. Description allows it to tell when two board patterns 
are, for the purposes of the game, the same or different. Induction works in two 
ways — by generalization and by discrimination. The program should always try 
to generalize where possible, as this avoids storing a mass of particular cases; but 
it should be able to detect over-generalization and correct it by discriminating 
more finely. 

Let us suppose that it has encountered two configurations and been told that 
they are both very good. In one there are 4 pieces in a row to the north, in 
another there are 4 to the west. It should hypothesize, until evidence arrives to 
the contrary, that 4 pieces in any direction are very good. This is the common 
feature. On the other hand, if it has met several instances of three in a row and 
always been told they are good, when it meets a threesome that is rated as less 
good by the teacher it should seek another feature, apart from the number of 
pieces, that enables it to discriminate the present case from the others. This 
feature might be a blockage, which was absent from the previous cases. 

To do anything like this requires a notation for representing any pattern of 
four lines radiating from an empty square in an economical manner. If we adopt 
the features previously discussed, and impose a canonical ordering, e.g. so that 
the line with the greatest number of pieces comes first irrespective of its direction, 
the number of patterns is greatly reduced as compared to raw board-states, though 
still huge. Further reduction is achieved through insertion of ‘dont-care’ fields by 
the generalization procedure. For instance in any pattern with 4 own pieces the 
other features are immaterial. Hence a large number of patterns are covered by 
one description. To give an idea of what is required, using the features proposed 
in Section 16.2, we might start with a partial pattern ranking list of 


(gaps=0) 
(gaps=1) 


which would have developed, some time later, into 


(gaps=0,hostile=0) 

(gaps=0,hostile=1) 

(gaps=1,hostile=0 ,open=1) 

(gaps=1 ,hostile=O0,open=0) 

(gaps=1,hostile=1,open=1 ,together=1) 
(gaps=1 ,hostile=1,open=1,together=0) 
(gaps=1 ,hostile=1,open=0,together=0) 

where features not explicitly mentioned are assumed to be irrelevant. The program 


begins by attempting maximal generalization and only discriminates when it is 
forced to do so. 
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The fact that the above list only deals with single lines, neglecting the 4-way 
case, should give you some grasp of the complexities involved. If you are still not 
deterred, you can spend many happy hours devising a good description language 
and the rules to manipulate it for your favourite board game. Papers by Elcock 
and Murray, Levy and Samuel will give you some additional ideas (Samuel, 1967; 
Elcock and Murray, 1968; Levy, 1980; Levy, 1981). 

A final word: if you do incorporate a learning mechanism be sure to switch off 
learning mode when giving exhibitions. You do not want the program picking up 
bad habits from poor players! 


PART FOUR 


Pascal ata Glance 


‘,.. that gruesome way of perishing, of which Pascal is the most 
famous example.’ Friedrich Wilhelm Nietzsche 


APPENDIX A 


ASCII code 


Here is a complete listing of the ASCII character code. All respectable computer 
systems support it: it yours does not, complain to your supplier or data-processing 
manager. 


no, 


SO el cee ae ee ee ee ee 
OoOWMWANNMNAWNYF OU WAANIADAHRWNYNHK CO 


WWNONNN NN NY ND LO 
Re OO ONIN MN BW NY 


char 


nul 
soh 
stx 
etx 
eot 
eng 
ack 
bel 
bs 
ht 
lf 
vt 
ff 
cr 
sO 
Si 
dle 
dcl 
dc2 
dc3 
dc4 
nak 
syn 
etb 
can 
em 
sub 
esc 
fs 
gs 
IS 
us 


no. 


32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 


char 


sp 


99 


O ONIN BWN KH Om * 


~~ V/ HA y oe 


no. 


64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 
9] 
92 
93 
94 
95 


| PTT TNX KEK CHYRAOVOZZTFATTM TOM SVAPPE® Sf 
jae) 


no. 


96 

97 

98 

99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 


char 


A 


QC" TT NM KK Este rtVrrovosp ssrA roa hteacas. 


Q. 
oO 
peso 
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The parity bit (most significant) is assumed set to zero in this table. 

ASCII codes below 33, and number 127, are control characters. They are meant 
to have standard meanings, but in practice their functions vary from system to 
system. The ones which are most nearly universal are: 


0 


127 


nul 
bel 
bs 
ht 


null character 

bell or buzzer 

backspace 

horizontal tab 

line feed 

form feed (new page or clear screen) 
carriage return 

escape 

space 

delete/rubout. 


APPENDIX B 


Basic Pascal facts 


There follows a quick reference guide to Pascal. For a much more thorough 
treatment see The Pascal Handbook (Tiberghien, 1980). 


B1 Special’symbols 


Reserved words 


AND END 
ARRAY FILE 
BEGIN FOR 

CASE FUNCTION 
CONST GOTO 

DIV IF 

DO IN 
DOWNTO LABEL 
ELSE MOD 


Standard identifiers 


ALFA FALSE 
BOOLEAN INPUT 
CHAR (INTEGER 
Compound symbols 


<> <= > (x 


Punctuation marks 


+ -~- *« | < > = ( 


NOT 

OF 

OR 
OTHERWISE 
PACKED 
PROCEDURE 
PROGRAM 
RECORD 
REPEAT 


MAXINT 
NIL 
OUTPUT 


SET 
THEN 
TO 
TYPE 
UNTIL 
VAR 
WHILE 
WITH 


REAL 
TEXT 
TRUE 
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B2 Pascal operators 


Group 


Arithmetic 


+A 

—A 
A*B 
A/B 

A DIV B 
A MOD B 
A+B 
A—B 


Relational 


A<B 
A>B 
A<=B 
A>=B 


A=B 
A<>B 


Logical 


NOT A 
A AND B 
A ORB 


Set 


A*B 
A+B 
A—B 
AINB 


Description 


identity 

sign inversion 
multiplication 
real division 
quotient 
remainder 
addition 
subtraction 


less than 
greater than 
less or equal 
greater than 
or equal to 
equality 
inequality 


negation 
conjunction 
disjunction 


intersection 
union 
difference 
membership 


Priority 


NON WWW WwW kf fh 


a ee ee 


me NN W 


L-type 


numeric 
numeric 
integer 
integer 
numeric 
numeric 


ordinal 
ordinal 
ordinal 
ordinal 


any 
any 


boolean 
boolean 


set 
set 
set 
scalar 


R-type 


numeric 
numeric 
numeric 
numeric 
integer 

integer 

numeric 
numeric 


ordinal 
ordinal 
ordinal 
ordinal 


any 
any 


boolean 
boolean 
boolean 


set 
set 
set 
set 


Result 


ditto 
ditto 
ditto 
real 
integer 
integer 
ditto 
ditto 


boolean 
boolean 
boolean 
boolean 


boolean 
boolean 


boolean 
boolean 
boolean 


set 
set 
set 
boolean 


Expressions of high priority are evaluated before those of lower priority, for 


equal priority the order is left to right. 


‘Numeric’ means integer or real; ‘ordinal’ means any scalar type except REAL; 
‘ditto’ means the same type as the operands or real if either is real; ‘scalar’ means 
any type that is not structured. Equality and inequality cannot be applied to files, 
but can to any other type. 


B3 Predefined functions 


Name 


ABS 
ARCTAN 
*CARD 
CHR 
*CLOCK 
COS 
EOF 
EOLN 
EXP 
*FIRST 
*LAST 
LN 
*MAX 
*MIN 
ODD 
ORD 
PRED 
ROUND 
SIN 
SQR 
SQRT 
SUCC 
TRUNC 


[* not found on all systems. ] 


Argument 


numeric 
numeric 
set 
integer 
numeric 
file 

text 
numeric 
ordinal 
ordinal 
numeric 
scalar 
scalar 
integer 
ordinal 
ordinal 
real 
numeric 
numeric 
numeric 
ordinal 
real 


Result 


ditto 
real 
integer 
char 
integer 
real 
boolean 
boolean 
real 
ditto 
ditto 
real 
ditto 
ditto 
boolean 
integer 
ordinal 
integer 
real 
ditto 
real 
ditto 
integer 
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Description 


absolute value 

arctangent of angle in radians 
no. of members in the set 
character with given code no. 
runtime in milliseconds 
cosine of angle in radians 
true if past end of data 

true if past end of line 

e raised to given power 
lowest value of given type 
highest value in given type 
natural logarithm of arg. 
larger of two given values 
lesser of two arguments 

true if argument is odd 
position of arg. in its type 
predecessor of argument 
rounds arg. to nearest integer 
sine of angle in radians 
square of given number 
Square root of given number 
successor of given value 
truncates arg. towards zero 


‘Numeric’, ‘ordinal’, ‘scalar’ and ‘ditto’ used as in Section B2. 
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B4 Predefined procedures 


Name 


*DATE 


*DISPOSE 


GET 
*HALT 
NEW 
PACK 
PAGE 
PUT 
READ 


READLN 


RESET 


REWRITE 


*TIME 


UNPACK 


WRITE 


WRITELN 


[* Not found on all systems. ] 


Argument(s) 


alfa 
pointer 
file 
pointer 
U, I, P 
text 
file 


text ,... 
text ,... 


file 
file 
alfa 
P,U,I 


text ,... 
text ,... 


Description 


assigns date to arg. as character string 
frees storage 

advances file pointer and fills buffer 

stops execution 

allocates storage 

packs from position I of U into P 

feeds new page on text file 

appends buffer contents to file 

see Chapter 5 

see Chapter 5 

opens file for reading from start 

opens file for writing from start 

assigns time of day to arg. as characters 
unpacks P into U from position I onwards 
see Chapter 5 | 
see Chapter 5 


[P packed array; U unpacked array of same base type; I index type of arrays.] 


‘Numeric’, ‘ordinal’, ‘scalar’ and ‘ditto’ used as in Section B2. ‘Alpha’ refers to a 
packed 10-character vector, ‘pointer’ to any pointer type. 


APPENDIX C 


Catalogue of 
software 


This provides an index to all the examples in the book. Subprograms embedded 
within complete programs are not listed separately. 


Name Type Section Description 

AVERAGE part 6.4 finds mean and maximum 

BIGDEAL prog 10.3 bridge hands and scoring 

BIRTHDAY prog 2.1 calculates week day from date 
CALENDAR prog 9.x produces calendar for any year 1900 . . 2099 
CHANCES prog 7.6 binomial probability calculations 
CHEMICAL prog 9.x determines atomic weight of a compound 
CHOPPER prog 11.4 simplifies boolean expressions 
COUNTERI prog 8.2 frequency count fore 

COUNTER2 prog 8.2 frequency count for vowels 

COUNTER3 prog 8.2 frequency count for letters 

COUNTER4 prog 8.3 frequency count for letter pairs 
DECLINE prog 6.7 tabulates asset depreciation 
DRAWLINE proc 7.1 prints various kinds of underlining 
DWELLING part 11.2 skeleton of estate agent system 
EASTERS prog 6.x calculates date of easter for any year 
EASY prog 7.1 prints table of square roots and logs 
FACTORY prog 8.x computes and prints very large factorials 
FOOTBALL prog 15.3 five-a-side soccer simulation 

GOMOKU1 prog 16.3 plays ancient oriental game of Go-Moku 
HEAPSORT proc 13.4 Heapsort, also called treesort3 
HIGHFACT func 7.5 highest common factor (recursive) 
INFLATER prog 5.3 demonstrates effect of inflation 
INTCOUNT prog 9.2 number tallying 

JOURNEYS prog 14.6 shortest route on Underground or Metro 


LINKLIST prog 11.3 reverses a character sequence 
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Name Type Section Description 

MAKEFILE prog 10.x creates file of student records 
MAXIMIZE func 16.4 alpha-beta look-ahead function 
MERGING part 9.4 two-way merge process 

MINIMIZE func 16.4 alpha-beta look-ahead function 
MORTGAGE prog 7.x works out monthly loan repayments 
PARKING prog 5.x works out garage parking fee 
PICKSORT prog 8.x simple selection sort 

PIVALUE prog 6.3 calculates approximation to pi 
PRIMENOS prog 8.x prints table of primes and prime factors 
PUTWORDS prog 9.2 writes word list to file 

QUANTILE proc 8.7 median and other percentiles 

QUIX proc 13.4 quicksort internal sorting algorithm 
SEARCHER prog 9.3 binary search for acronyms 
SELECTOR prog 10.2 selection of student records 
SHERLOCK prog 10.1 logical detective work 

SHOWOFF prog 8.6 displays chess game as it progresses 
SIMPSON prog 11.1 integration by Simpson’s rule 
SPEEDMPH func 74 speed in mph from metres and seconds 
SUNSHINE prog 8.x peaks and troughs in star’s light emission 
TAPESORT prog 13.3 merge sorting of external files 
VELOCITY prog 5.x computes speeds from distance and time 
VERYEASY prog 7.1 prints table of square roots and logs 
YARDMILE proc 7.2 conversion from yards to miles 

Type key: 


func function 

part program fragment 
proc procedure 

prog complete program. 


Section .x refers to the Solutions to Selected Exercises: for example 8.x means 
that the item can be found as the answer to an exercise set in Chapter 8. 


APPENDIX D 


Syntax diagrams 


These are based on the diagrams given in the Pascal User Manual and Report 
(Jensen and Wirth, 1975). In the event of any discrepancy between a diagram here 
and one in the body of the text, the version here is more comprehensive. 
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Program 


PROGRAM on identifier > 5): (; ) 
C END.) 


e+ 


Declarations 


(LABEL) _ unsigned integer _ 
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parameter list 


Subprogram definition 


FUNCTION 


C 


type identifier 


PROCEDURE 


parameter list 
— ee 


Parameter list 


type identifier 


/ FUNCTION | 


PROCEDURE 
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Statement 


unsigned integer (: ) 


Se 
function 


identifier 


(:=) expression 


_ expression _ 


_ 


subprogram 


identifier 


WHILE (00) 
REPEAT UNTIL 


unsigned integer 
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Term 


Simple expression 


Expression 
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Variable 


variable identifier 


Factor 


subprogram 


identifier 


m expression E 


Glew ~vale 


©—-eeresion } + 
ns a 
expression }—p(_.. }+[_expression |] 

Geel 
a 


function identifier 


SYNTAX DIAGRAMS 


Unsigned constant 
constant identifier 
eee 


Constant 


Simple type 


Type 


297 


298 PASCAL AT WORK AND PLAY 


Field list 


identifier 


Variant part 


Sis OP 


i field list ls 


-Eb 
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Unsigned integer 


por 
© ae 
ae 


Unsigned number 


unsigned integer 


$008$503358805889580590088 


Digit 


CoP o SPORE 
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Solutions to 
selected exercises 


Model answers are given here to all the odd-numbered exercises in the 
even-numbered chapters and to the even-numbered exercises in odd-numbered 
chapters. Program listings are accompanied, where appropriate, by sample 
printouts. All the programs were run on the DEC System-10. 

There is always more than one way of writing a program, so these answers, 
though correct, should not be regarded as definitive. 


Chapter 5, Exercise 2 


program velocity; (chapter 5, exercise 2 +) 
(% calculates speeds given distance & time *) 


const 
yard = 0.9144; mile = 1609.344; (conversion factors *) 
hour = 3600; (seconds in an hour +) 


var 
dist,mins,secs : 0. . maxint; 
ms,kh,mh : real; 


begin 
write(‘distance travelled in metres ? ’); 
read (dist); 
write(’time taken in minutes & seconds ? ’); 
read(mins,secs); 
(% no input checking as the if statement 
not covered until chapter 6 :) 


secs := secs + mins*60; 

ms := dist / secs; (+ metres per second *) 
kh := ms/1000 * hour; (%*km per hour *) 
mh :=ms/mile *hour;  (%*m.p.h. %) 


SOLUTIONS TO SELECTED EXERCISES 


writeln; 

writeln(‘you took ’, secs:7,’ seconds to go’, dist:7, ’ metres.’); 
writeln(‘your average speed was:); 

writeln(‘m/s:10, ‘k/h’:10, ‘m.p.h.’:10); 

writeln(ms:10:4, kh:10:4, mh:10:4); 

writeln; 

writeln(‘keep right on to the end of the road.’): 


end. 


Chapter 5, Exercise 4 


program parking; (chapter 5, exercise 4 *) 
(% program to work out garage parking fee +) 


const 
perhour = 4.75; (charge per hour or fraction thereof +) 
var 
come, gone: 0. . 2400; 
mins, t1,t2:0.. 1440; 
hour :0.. 24; 
(% subranges afford partial error protection +) 


begin 
writeln(‘please give times in military hours:’); 
writeln(‘e.g. 1320 for 1:20 p.m.’); 
writeln; 


write(‘when did you arrive ?’); read(come); 
write(‘when did you depart ?'); read(gone); 


hour := come div 100; (# 1st 2 digits *) 
t1 := come mod 100 + hour#60; (entry time +) 
hour := gone div 100; 
t2 := gone mod 100 + hour#60; (exit time +) 
mins := t2 —t1; 
hour := mins div 60 + 1; 

(% charges for incomplete hours as well +) 


writeln; 

writeln(‘you were parked for ’, mins:4, ' minutes,’); 
writeln(‘which counts as ’, hour:8, ’ hours.’); 
writeln(‘that will cost : $’, hour*perhour :8:2): 
writeln; 

writeln(‘drive carefully!’); 


end. 
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program easters; (chapter 6, exercise 1 +) 
(% calculation of date of easter sunday +) 
var 
cent, clav, d, e, gold, greg, moon, year : integer, 


begin 
writeln(‘program to compute date of easter sunday:’); 
writeln(‘give a date before 1582 to end execution,’): 


repeat 
writeln; 
write(’which year? ’); 
read(year); 
if year > 4902 
then 
writeln(‘you should live so long!’) 
else 
begin 
writeln(‘easter day falls on:’); 
cent := year div 100 + 1; 
greg := 3 * cent div 4 — 12; 
gold := year mod 19 + 1; 
clav := (8 * cent + 5) div 25 — 5 — greg; 
e := 5 * year div 4 — greg — 10; 
moon := (11 #* gold + 20 + clav) mod 30; 
if (moon = 25) and (gold > 11) 
or (moon = 24) then 
moon := moon + 1; 
d := 44 — moon; 
ifd<21then d:=d+30; 
d:=d+7—(d+e)mod/7; 
if d > 31 
then 
writeln(‘april’, d-31:4) 
else 
writeln(’march’, d:4); 
end; 
until year << 1582; (+ before present calendar *) 


writeln(’‘merry christmas!’); 
end. 


SOLUTIONS TO SELECTED EXERCISES 


Sample output 


[LNKXCT EASTER execution] 
PROGRAM TO COMPUTE DATE OF EASTER SUNDAY: 
GIVE A DATE BEFORE 1582 TO END EXECUTION. 


WHICH YEAR? 1982 
EASTER DAY FALLS ON: 
APRIL 11 


WHICH YEAR? 1948 
EASTER DAY FALLS ON: 
MARCH 28 


WHICH YEAR? 1999 
EASTER DAY FALLS ON: 
APRIL 4 


WHICH YEAR? 4444 
EASTER DAY FALLS ON: 
APRIL 13 


WHICH YEAR? 5555 
YOU SHOULD LIVE SO LONG! 


WHICH YEAR? 0 

EASTER DAY FALLS ON: 
APRIL 17 

MERRY CHRISTMAS! 


Chapter 7, Exercise 2 


program mortgage; (%* chapter 7, exercise 2 *) 
(%* calculates monthly loan repayments *) 


type cardinal =O..maxint; (nonnegative numbers *) 
var 

ai, loan, p, rate : real; 

payments, y : cardinal; 


function expo (a: real; b : cardinal) : real; 
(% raises a to the power of b*) 
vare:real; i: cardinal; 


begin 
e:= 1.0; 
fori := 1 to bdo 
e:=e *a; 
expo :=e; 
end; (expo *) 
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function sums (r: real; tt: cardinal) : real; 
(%* sums series of powers *) 
vare,s : real; m=: cardinal; 


begin 
s:=0; e:= 1.0; 
for m :=0 to t—1 do 
begin 
s:=ste; 
e:=e xr; 
end; (* more efficient way of getting powers *) 
suMS := S; 
end: (sums %) 


begin (* main line *) 
writeln(‘mortgage calculations:’); 
write(‘amount of loan ?’); read(loan); 
write(‘annual interest? ’); read(ai); 
write(‘years to pay it? ’); read(y); 
writeln; 


if (loan > O) and (ai > 0) and (ai < 1000) 
and (y > 0) then 
begin (% normal calculations +) 
rate := 1+a1/1200; (* monthly interest factor +) 
payments := y * 12; 
p := loan * expo(rate, payments) / sums(rate,payments); 


writeln(‘to pay off’, loan:12:2, ‘’ megabucks’); 
writeln(‘at’, ai:7:2, '% in’, y:8, ’ earth years’); 
writeln(‘you must pay’, p:10:2, ’ each month.’); 
end 
else (suspicious input *) 
begin 
writeln(‘warning:’, chr(7)); 
writeln(‘entering infinite improbability field!’); 
writeln(‘activate thought-shields immediately! !’); 
end: 
writeln(‘that’’s capitalism for you.’); 


end. 


SOLUTIONS TO SELECTED EXERCISES 


Sample output 


MORTGAGE CALCULATIONS: 
AMOUNT OF LOAN ? 1000 
ANNUAL INTEREST? 100 
YEARS TO PAY IT? 10 


TO PAY OFF 1000.00 MEGABUCKS 
AT 100.00% IN 10 EARTH YEARS 
YOU MUST PAY ~~ 83.34 EACH MONTH. 
THAT’S CAPITALISM FOR YOU. 


EXIT 


EX 

LINK: Loading 

[LNKXCT MORTGA execution] 
MORTGAGE CALCULATIONS: 
AMOUNT OF LOAN ? 12345 
ANNUAL INTEREST? 0 
YEARS TO PAY IT? 999 


WARNING: 

ENTERING INFINITE IMPROBABILITY FIELD! 
ACTIVATE THOUGHT-SHIELDS IMMEDIATELY!! 
THAT'S CAPITALISM FOR YOU. 


Chapter 8, Exercise 1 


program sunshine; (chapter 8, exercise 1 *) 
(* analyses fluctuations in star’s emission *) 


type year = 66..80; (fixed observation period *) 
var 
y, yl, y2: year; 
a1, amin, amax, d1, dmin, dmax : year; 
emission : array [year] of real; 
(%* starlight readings, 1 per year +) 


begin (main line *) 
writeln(‘starlight analysis program.’); 
writeln; 
write(‘first year of observations (66—80) ? ’); 
read(y1); 
write(‘final year of observations (’, y1:2, ‘—80) ? ’); 
read(y2); 
if y1 > y2 then 
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begin 
writeln(‘are you in a time-warp??’); 
y:=yl; yl:=y2; y2:=y; (*swap +) 
end; 


(%* get data *) 
for y := y1 to y2 do 
begin 
write(y:2,’:'); read(emission[y]); 
end; 


(% first intialize extreme values *) 
amin :=y1; amax :=y1; 

dmin :=y1;  dmax :=y1; 
al:=y1; d1:=y1; 


(% now the calculations *) 
for y := y1+1 to y2 do 
if emission [y] > emission[y—1] then 
begin (ascending *) 
d1:=y; 
if (y —a1) > (amax — amin) then 
begin (% longer upward series *) 
amin :=al;  amax :=y; 
end; 
ify<y2then (test for peak *) 
if emission[y] > emission[y+1] then 
writeln(‘** peak at’, y:4,’—>', emission[y] :8:4): 
end 
else if emission[y] <emission[y—1] then 
begin (descending *) 
al :=y; 
if (y —d1) > (dmax — dmin) then 
begin (longer downward series +) 
dmin :=d1; dmax :=y; 
end; 
ify<y2then (test for trough +) 
if emission[y] < emission[y+1] then 
writeln(‘** trough at ’, y:2,’ —>’, emission[y] :8:4); 
end 
else (% neither up nor down *) 
begin 
al:=y; dl:=y; 
( ends both kinds of sequence *) 
end; 
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(% output *) 

writeln; 

writeln(‘longest ascending sequence from ’, amin:4, ‘ to ’, amax:2); 
writeln(‘longest descending sequence from’, dmin:4, ’ to ‘, dmax:2); 


end. 


Chapter 8, Exercise 3 


program picksort; (#* chapter 8, exercise 3 *) 
(* tests simple selection sort *) 


type cardinal = 0. . maxint; 
ivector = array [1..100] of integer; 


var 
i, nN : cardinal; 
v : ivector; 


procedure sort (var a : ivector; n : cardinal); 
(% arranges a[ ] in descending order *) 
var i,imin : cardinal; ax : integer; 


begin 
while n> 1 do 
begin 
imin :=n;  (* location of minimum *) 
for i := 1 to n—1 do 
if ali] <a[imin] then imin := i; 
(% smallest at imin, now swap *) 
ax :=al[n];  aln] :=alimin}]; a[imin] := ax: 
(%* smallest now at end *) 
n:=n—1; (one fewer to look at *) 
end; 
end; 


begin (* main line *) 
writeln(‘test of selection sort routine:’); 
write(“how many items ?’); read(n);: 
writeln(’please enter ’,n:4,’ numbers:’); 
for | :=1tondo 
read(v{[i] ); 
sort(v,n); 
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writeln; 
writeln(’numbers in descending order:’); 
fori:=1tondo 
if i mod 8=0 then 
writeln(v [i] :8) 
else 
write(v[i] :8): 


end. 


Chapter 8, Exercise 5 


program primenos; (%* chapter 8, exercise 5 *) 
(% demonstration of recursion *) 


var a: integer; (number under test *) 
isprime : boolean; 
counter : integer; (+ number of primes found #) 
atop : integer; 
p: array [1 ..5000] of integer; (* table of primes *) 


procedure factors (n: integer); (+ may call itself +) 
var j,k : integer; 
division : boolean; 
begin 
ifn<4then write(n:4) (*1..3 are prime *) 
else begin 
j:=1; 
repeat j :=j +1; 
k :=plj]; (* next prime as possible factor *) 
division := (n mod k) =0; (divisible by k ? *) 
until division or ((k#k) > n) or (j = counter); 
if division then begin 
isprime := false; 
factors(n div k); (recursive call to find further factors *) 
write(‘*’k:4);  (%k is a factor *) 
end 
else write(n:4) 
end; 
end; (of factors *) 
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begin (+ main line *) 
counter :=1; p[counter] := 1; 
write(‘highest number to be factored ? '); 
read(atop); 
fora :=2 to atop do begin 
write(a:7,'=); 
isprime := true; 
factors (a); 
if isprime then begin 
write(‘* 1 + prime **’): 
counter := counter + 1; 
p[counter] := a; 
end; 
writeln; 
end; (for loop *) 


writeln(‘ in ’, atop:8,’ numbers there were’, counter:4,’ primes.’): 
tf 


end. 


Chapter 8, Exercise 7 


program factory; (chapter 8, exercise 7 *) 
(% generates & prints large factorials +) 


const 
enddigit = 100; (no. of digits in large no. *) 
linesize = 64; (+ width of paper +) 


type 
smallint =O. . 999; 
cardinal = 0. . maxint:; 
largenum = array [1 .. enddigit] of smallint; 


var 
i,n, sp : cardinal; 
fact: largenum; 


procedure zero (var n : Jargenum); 
(* sets n to big zero ¥) 
var | : cardinal; 


begin 


fori :=1toenddigitdo nl[i] :=0;: 
end; 
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procedure bigmult (var n: largenum; m: cardinal); 
(% multiplies big number n by m +#) 
var i, product : cardinal; 
q,r: smallint; 


begin 
q:=0; (initial carry *) 
for i := enddigit downto 1 do 
begin 
product := n[i] *m+q; 
q := product div 1000; (carry out #) 
r := product mod 1000; (+ remainder *) 
(% each cell holds 3 decimal digits +) 
n{i] :=r; 
end; 
if gq >0 then 
begin 
writeln(‘overflow in bigmult!’); 
halt; 
end; 
end; (*bigmult *) 


procedure bigprint (var n : largenum); 
(% prints out a long integer +) 
var i, j : cardinal; 


j:= 1; 
while (j < enddigit) and (n[j] = 0) do 
j= f+, 
(* finds first nonzero digit *) 
for ij := j to enddigit do 


begin 
sp :=sp + 4; 
if sp > linesize then 
begin 
sp := 4;  writeln; 
end; 


if n[i] >99 then write(n[i] :4) 
else if n[i] >Qthen write (‘0’, n[i] :2) 
else write(‘ 00’, n[i] :1); 
end; 
writeln; 
end; (* bigprint *) 
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begin (%* main line *) 
write(‘factorial up to what ?'); read(n); 
writeln; 


zero(fact); fact[enddigit] := 1; (fact set to 1 *) 


writeln(’ on factorial(n)’); 
writeln; 
fori := 1tondo 

begin 


write(i:4, ' —— ‘); 
bigmult(fact,i); 
sp := 8; (character count for bigprint +) 
bigprint (fact); 
end; 
writeln; 


end. 


Chapter 9, Exercise 2 


program calendar(listfile); (* chapter 9, exercise 2 *) 
(* prints calendar for any year *) 


type mmat = array[1..6,0..6] of integer; 
m3 = array[1..3] of mmat; 
days = (mon,tue,wed,thu,fri,sat,sun); 

var name : array[1.. 12] of alfa; 
last : array [1.. 12] of integer; 
mat3 :m3; d1,n,mn,l,year : integer; 
listfile : text; 


procedure fill(mn,day1 : integer; var month : mmat); 
(% fills 6-week by 7-day table with day numbers or zero +) 
var i,w,d,k,nd : integer; 
begin 
nd :=last[mn]; (last day of this month *) 
w:=1; d:=day1; k:=0; 
for i := 0 to day1 — 1 do month[1,i] :=0: 


for i:= day1 to 41 do 
begin 
k:=k+1; (#day number *) 
if k<=nd then month [w,d] :=kelse month[w,d] :=0; 
d:=d+1; 
if d > 6 then 
begin d:=0; w:=w+1end 

end 

end; (of fill *) 
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procedure out3(m : m3; this : integer); 
(% prints 3 months side by side +) 
var w,i,d,n : integer; dn: days; 
begin (* headings first *) 
writeln(listfile); 
writeln(listfile,’ ’:12, name[this-2] ,’ ‘:30, name[this-1], ’ ’:30, name[this] ); 
for i:=1 to 120 do write(listfile, ’*’); writeln(listfile); 
for i*—1 to 3 do 
begin 
for dn := mon to sun do write (listfile,dn:5); 
write(listfile,’  ‘); 
end; 
writeln(listfile); 


for w := 1 to6do 
begin 
for i:=1 to 3 do 
begin 
for d:=0 to6do (7 days of week +) 
if m[i,w,d] <=Othen  write(listfile, ’ ’) 
else write(listfile, m[i,w,d] :5); 
write(listfile,’  ’): 
end; 
writeln(listfile); 
end; 
writeln(listfile); 
end; (of out3 *) 


function leapyear (y : integer) : boolean; 
(% true if y is a leap year *) 
begin 
if (y mod 400 = 0) or (y mode 4 = 0) and (y mod 100 < > 0) 
then leapyear := true 
else leapyear := false 
end; (of leap year function *) 


function firstday (y : integer) : integer; 
(% calculates days since 1 jan 1900 +) 
var y1,n,I : integer; 
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begin 
y1 :=y — 1900; 
n:=y1 * 365; 


| := (y1 div 4) — (y1 div 100) + (y1 + 300) div 400; 
(% | is no. of leap years since 1900 to this year *) 
if leapyear(y) then! :=1—1; (don’t count this year *) 
firstday :=n +1; 
end; (of firstday *) 


procedure init; 
(* fills global vectors for 12 months #) 
begin 
name[1] := ‘january’; last [1] :=31; 
name[2] := ‘february’; last [2] := 28; 

(* leap year taken care of outside %*) 
name[3] :=‘march ‘; last{3] :=31; 


name[4] := ‘april ’: fast(4] := 30; 
name[5] := ‘may ’: fast(5] := 31; 
name[6] := ‘june ’: fast(6] :=30; 
name[7]  := ‘july ‘: last(7] :=31; 


name[8] :=‘august ‘'; last(8] :=31; 
name[9] :=‘september’; last{9] := 30; 
name[10] := ‘october ‘; last{10] :=31; 
name[11] := ‘november’;  last(11] := 30; 
name[12] := ‘december’;  last{12] := 31; 
end; (of initialization *) 


(% main program +) 
begin init; (fill data arrays first *) 


writeln(‘for which year do you want a calendar?’); 
read(year); 
if year < 1900 then writeln(‘too early!’) 
else 
begin 
rewrite (listfile); 
if leapyear(year) then last(2] := 29; (# feb x) 
d1 :=firstday(year); (days since 1 jan 1900 +) 
d1:=d1mod7; (*day coded as0..6=mon..sun #*) 
writeln(‘first day numbered ',d1:4); 
(% just a check +) 
writeln(listfile,’*:** calendar for: ’,year:7,’ *:*::’); 


mn:=1; (month number, 1. . 12 +) 
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repeat 
forn:=1to3do 
begin 
fill(mn,d1,mat3[n]); (stores 1 month at a time +) 
d1 := (d1+last{mn]) mod 7; (# 1st day of new month x) 
mn:=mn+1;  (%* next month #) 
end; 
out3(mat3,mn—1); (+ printout of three months’ data +) 
until mn > 12; 
writeln(listfile); 
writeln(listfile,/happy new year!’): 
end; 
end. 


Chapter 9, Exercise 4 


program chemical (elements); (** chapter 9, exercise 4 *) 
type atom = (h,he,li,be,b,c,n,o,f,ne,na,mg,al,si,p,s,cl,ar,k,ca,sc,ti,v,cr,mn,fe,co,ni,cu, 
Zn,ga,ge,as,br,kr,mo,ag,sn,sb,i,xe,pt,au,hg,pb,ra,u,np,pu,md); 


var wt : array [atom] of real; 
thisatom : atom; 
nval : integer; 
counter : integer; w: real; 
elements : text; (* input file or atomic weight data +) 


procedure filltab; (+ fills global table wt with atomic weights +) 
var a: atom; 
begin 
reset(elements); 
while not eof(elements) do begin 
read(elements,a); 
readin(elements,wt([al ); 
end; (all elements should be on file *) 
end; 


procedure init; (% instructions for user *) 

var a: atom; 
zz : integer; (count of items per line +) 

begin 
writeln(‘this program computes molecular weights.’); 
writeln(‘you type in a chemical formula, e.g.’); 
writelIn(’ h201’); 
writeln(‘for water, or’); 
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writeln(’ naicl 1’); 

writeln(’ for salt, using spaces to separate’); 
writeln(‘atom names from numbers (even 1, must’); 
writeln(‘be mentioned) and pressing return’); 
writeln(‘to finish the input line.’); 

writeln(’ when you have done enough just’); 
writeln(’ give a blank line on its own to end input.’); 
writeln; 

writeln(’ the atoms known are:’); 

write(’ ‘); zz:=0; 

for a :=h to md do 


begin 
write(a:4); 
ZZ :=zz+1: 


if zz mod 16 = 0 then 
begin writeln; write (’ ‘); 
end; (line feed etc. *) 
end; 
writeln; 
end; 


begin (main line *) 
filltab; (* get input data *) 
init; (tell user what to do *) 


repeat 
writeln; 
write(’compound is ? ’): 
if eoln then readin; 
w :=0.0; counter := 0; 
while not eoln do 
begin 
read(thisatom); (* get atom name +) 
read(nval); (% get atom’s frequency *) 
counter := counter + 1; 
w :=wtnval *wt[thisatom]; (+ add in wt*frequency *) 
end; 
(% of input line +) 


writeln(’molecular weight is ‘.w:10:4): 
until counter =0; (no atoms given at all *) 


end. 
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Sample output 


ELEMENTS = ATOM.DAT 
THIS PROGRAM COMPUTES MOLECULAR WEIGHTS. 
YOU TYPE IN A CHEMICAL FORMULA, E.G. 
H201 
FOR WATER, OR 
NA1CL 1 
FOR SALT, USING SPACES TO SEPARATE 
ATOM NAMES FROM NUMBERS (EVEN 1 MUST 
BE MENTIONED) AND PRESSING RETURN 
TO FINISH THE INPUT LINE. 
WHEN YOU HAVE DONE ENOUGH JUST 
GIVE A BLANK LINE ON ITS OWN TO END INPUT. 


THE ATOMS KNOWN ARE: 
H HE LI BE B C N O F NE NA MG AL SI P 
S CL AR K CA SC TI V CR MN FE CO NI CU ZN 
GA GE AS BR KR MO AG SN SB | XE PT AU HG PB 
RA U NP PU MD 


COMPOUND IS ?H1CL1 
MOLECULAR WEIGHT IS 36.4610 [Hydrochloric Acid] 


COMPOUND IS?H2S104 
MOLECULAR WEIGHT IS 98.0775 [Sulphuric Acid] 


COMPOUND IS?C2H501H1 
MOLECULAR WEIGHT IS 46.0695 [Ethyl Alcohol] 


COMPOUND IS?C1H301H1 
MOLECULAR WEIGHT IS 32.0424 [Methyl] Alcohol] 


COMPOUND IS ? 
MOLECULAR WEIGHT IS 0.0000 


Chapter 10, Exercise 1 


program makefile (studtext, studfile); (* chapter 10, exercise 1 +) 
(% simple file creation program *) 
(% some error checking, but not enough for live usage +) 


label 99; (+ emergency exit *) 


type 
cats = (fail,pass,good,vergood); 
date = record 
d:1..31; m:1..12; year:0..9999; 
end; 
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student = record 
surname : packed array [1 ..20] of char; 
initiall1, initial2 : char; 
birthday : date; 
subjects : set up (math, stat, computer, physics); 
mark : array [2..3,1..8] of O.. 100; 
male : boolean; 
category : Cats; 


end; 
var 


studtext : text; 

studfile : file of student; 
stud : student; 
s:0..maxint; 

warning boolean; 


procedure skip (var f : text); 
( skips Over spaces and commas On input *) 


begin 
while f“ in [‘’, ’,’] do 
get(f); 
end; («skip *) 


procedure readrec (var f : text vars : student); 
(% assembles record s from text data *) 
label 99; (for error exit *) 
var n,m,y : integer; 


begin 
skip(f); 
n :=0; 
repeat 
n:=n+1; 
s.surname[n] := f4; 
get(f); 
until (n >= 20) or (f4 =’,’) or eof (f); 
if eof(f) then (error +) 
begin 
warning := true; goto 99; 
end; 
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skip(f); 
s.initiall :=f%; get (f); 
s.initial2 :=f4 ; get (f); 
skip(f); 
with s.birthday do _read(f,d,m,year); 
(* reads date of birth *) 
skip(f); 
s.male :=f4 <> ‘f’:  (% default to male *) 
readin(f); (personal details on line 1 +) 
if eof(f) then (+ premature ending *) 
begin 
warning := true; goto 99; 
end; 
readin(f,s.subjects); 
(%* set input on line 2 +) 
( lines 3 & 4 have marks, ended by —1 if less than 16 +) 
n:0; y:=2; 
repeat 
read(f,m); 
if (m >= 0) and (m <= 100) then 
begin 
n:=n+1; 
if n > 8 then 
beginn:=1; y:=y+1;_ end; 
s.mark[y,n] :=m; 
end; 
until (m <0) or ((y = 3) and (n> 7)); 
readin(f); 
99: (* way out *) 


end; (#readrec *) 
begin 


(% main program +) 


reset(studtext); rewrite(studfile); 
s:=0; warning := false; 
while not eof(studtext) do 


begin 
s:=st1; 
readrec(studtext,stud); 
if warning then (+ quit *) 
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begin 
writeln(‘halted by error in record’,s:4); 
writeln(‘please check data file.’); 
goto 99; 
end; 
studfile*® := stud; 
put(studfile); 
end; 


writeln(s:4,’ records transferred to file.’); 


99: 
end. 
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